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22438, 23553, 25278, AND 26212 NOVEL HUMAN SULFATASES 

FIELD OF THE INVENTION 

The present invention relates to newly identified human sulfatases. In particular, 
the invention relates to sulfatase polypeptides and polynucleotides, methods of detecting 
the sulfatase polypeptides and polynucleotides, and methods of diagnosing and treating 
sulfatase-related disorders. Also provided are vectors, host cells, and recombinant 
5 methods for making and using the novel molecules. 

BACKGROUND OF THE INVENTION 

The biology and functions of the reversible sulfation pathway catalyzed by 
10 human sulfotransferases and sulfatases has been reviewed by Coughtrie et at (Chemico- 
Biotogical Interactions 109: 3-27 (1998)). This review, summarized below, focuses on 
the sulfation of small molecules carried out by cytosolic sulfotransferases rather than the 
sulfation of macromolecules and lipids catalyzed by membrane-associated 
sulfotransferases. 

1 5 Sulfation functions in the metabolism of xenobiotic compounds, steroid 

biosynthesis, and modulating the biological activity and mactivation and elimination of 
potent endogenous chemicals such as thyroid hormones, steroids and catechols. This 
pathway is reversible, comprising the sulfotransferase enzymes that cause the sulfation 
and the sulfatases that hydrolyze the sulfate esters formed by the action of the 

20 sulfotransferases. Accordingly, the interplay between these families regulates the 
availability and biological activity of xenobiotic and endogenous chemicals. The 
sulfatases, including the arylsulfatases (ARS), are located in lysosomes or endoplasmic 
reticulum. 

The presence of sulfated components depends upon the availability of key 
25 members of the sulfate pathway, i.e., substrate and activated sulfate donor molecule (co- 
substrate) and the balance between sulfation and sulfate conjugate hydrolysis that 
depends upon the activity and localization of the sulfotransferases and the sulfatases. 
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Essentially, divalent sulfate is converted to adenosine 5' phosphosulfate (PAPS) by 
hydrolysis of ATP. This compound is in turn converted to 3' phosphoadenosine 5' 
phosphosulfote by hydrolysis of ATP to ADP. This compound is ten converted to 
adenosine 3' 5' biphosphate concurrently with the formation of 4-niU-ophenolsuifate from 
5 4-nitrophenol. An ARS would then cleave the monovalent sulfate from the 4- 

nitrophenolsulfate to produce the original 4-nitrophenol. This forms the basis for the 
sulfation system in humans. Over- or under-production of any of these key molecules 
can result in sulfate-related disorders. For example, the brachymorphic mouse has a 
connective tissue disorder that results from a defect in PAPS formation that causes 

1 0 undersulfated cartilage proteoglycans. 

ARS enzymes and their genes have been associated with specific genetic 
diseases. ARSA is located in the lysosomes and removes sulfate from sulfated 
glycolipids. A deficiency of ARSA has been associated with metachromatic 
leukodystrophy and multiple sulfatase deficiency (MSD). ARSB is located in lysosomes 

15 andMs, as an endogenous substrate, dermatan sulfate and choildrotin sulfate. A 

deficiency of ARSB is associated with Maroteaux-Lamy syndrome and MSD. ARSC is 
located in me endoplasmic reticulum and has, as its endogenous substrate, cholesterol 
sulfate and steroid sulfates. A deficiency of ARSC is associated with X-linked 
ichthyosis and MSD. ARSD may be associated with MSD. ARSE has been associated 

20 with chondrodysplasia punctata and MSD. ARSF may be associated with MSD. ARSC 
hydrolyses sulfate esters on a wide range of steroids and cholesterol. ARSsalso 
hydrolyse sulfate conjugates of xenobiotics. 

MSD results from an inability to perform a co- or post-translational modification 
of a cysteine residue to serine semialdehyde (2-oxo-3-propionic acid). This residue is 

25 conserved in all eukaryotic sulfatases described by Coughtrieera/. ARSCmayhave a 
very broad specificity , extending to iodothyronine sulfates and a number of sulfate 
conjugates of xenobiotic phenols. 

The kinetic and catalytic properties of ARS enzymes in isolation, important for 
understanding substrate specificity and the physical and chemical properties of enzymes 

30 and substrates that allow substrate preference, have been characterized recently based on 
recombinant enzyme systems. For the expression of the human sulfotransferases, COS 
and V79 cells have been used. Coughtrie et al, have constructed and characterized V79 
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cell lines stably expressing ARSA, ARSB, and ARSC. These cell lines exhibited the 
expected substrate preferences of the three enzymes among the substrates 4-nitrocatechol 
sulfate, estrone sulfate, and dehydroepiandrosterone sulfate(DHEAS). 

The sulfation of small molecules can be broadly divided into the areas of 
5 chemical defense, hormone biosynthesis, and Deactivation. It was originally viewed 
that sulfation protected against the toxic effects of xenobiotics in that sulfate conjugates 
are more readily excreted in urine or bile and generally exhibit reduced 
pharmacological/biological activity relative to the parent compound. Many drugs and 
other xenobiotics are conjugated with sulfate. Many phenolic metabolites of the 

10 cytochrome P450 mono-oxygenase system are excreted as sulfate conjugates. 

Further, potent endogenous chemicals, such as steroids and catecholamines are 
found at high levels as circulating sulfate conjugates. For example, greater than 90% of 
circulating dopamine exists as the sulfated form. Sulfation is also suggested to play a 
role in the inactivation of potent steroids such as estrogens and androgens. Accordingly, 

15 sulfation is important in metabolism and homeostasis of such compounds in humans. 

DHEAS is the major circulating steroid in humans and estrone sulfate is the 
major estrogen. These chemicals act as precursors of estrogens and androgens. 
Extremely large quantities of such steroids or estrogens may occur during various stages 
of development, such as pregnancy. Estrone sulfate is a precursor for P-estradiol 

20 synthesis. In breast cancer cells it is hydrolysed by steroid sulfatase (ARSC) to estrone 
which is then converted to p-estradiol by action of another enzyme. Accordingly, ARSC 
is important for maintaining active estrogen. It is thus an important therapeutic target for 
the treatment of breast cancer. 

Cholesterol sulfate, synthesized in the skin epidermis, may have a role in 

25 keratinocyte differentiation. Accordingly, hydrolysis of cholesterol sulfate by steroid 
sulfatase may be important in skin formation and differentiation. This is the major organ 
affected in X-linked ichthyosis caused by mutations in ARSC. 

Although sulfation may widely serve to detoxify potent compounds, some sulfate 
conjugates are more biologically active than the corresponding parent compound. 

30 Minoxidil and cicletanine are activated upon sulfation. Further, an inhibitor of ARSC 
was shown to potentiate the memory enhancing effect of DHEAS. This suggests a role 
for sulfates and sulfation in the central nervous system. 
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An important example of bioactivMonby means of sulfation, however, occurs 
with dietary and environmental mutagens and carcinogens. For a large number of these, 
sulfation is the terminal step in the pathway to metabolic activation. Examples of such 
chemicals include aromatic amines (including heterocyclic amines) and benzylic 
5 alchohols of chemicals such as polycyclic aromatic hydrocarbons, safrole, and estragole. 

The sulfatase gene family has been reviewed in Parent! et al. (Current Opinion in 
Genetics and Development 7:386-391 (1997)), summarized below. 

The sulfatase family of enzymes is functionally and structurally similar. 
Nevertheless, these enzymes catalyze the hydrolysis of sulfate ester bonds from a wide 
10 variety of substrates ranging from complex molecules such as glycosaminoglycans and 
sulfolipids to steroid sulfates (see also Coughtrie et al, above). Several human genetic 
disorders result from the accumulation of intermediate sulfate compounds that result 
from a deficiency of single or multiple sulfatase activities. A subset of sulfatase, ARS, is 
characterized by the ability to hydrolyze sulfate esters of chromogenic or fluorogenic 
15 aromatic compounds such as j9-nitrocatechol sulfate and 4-memylumbeUiferyl sulfate. 
Desulfation is required to degrade glycosarninoglycans, heparan sulfate, chondroitin 
sulfate and dermatan sulfate and sulfolipids. Steroid sulfatase differs from other 
members of the family with respect to subcellular localization. It is localized in the 
microsomes rather than in lysosornes. Further, ARSD, ARSE, and ARSF are also non- 
20 lysosomal, being localized in the endoplasmic reticulum or Golgi compartment. 

The natural substrate of ARSA is cerebroside sulfate. Associated diseases are 
MLD and MSD. The natural substrate of ARSB is dermatan sulfate. The disease 
associated with this enzyme is MPSVI and MSD. The natural substrate of ARSC/STS is 
sulfated steroids. Diseases associated with this enzyme are XLI and MSD. The natural 
25 substrates of ARSD-F are unknown. The natural substrates of iduronate-2-sulfate 

sulfatase (IDS) are dermatan sulfate and herparan sulfate. Diseases associated with this 
enzyme are MPSII and MSD. The natural substrate of galactose 6-sulfatase is keratan 
sulfate and chondroitin 6-sulfate. Diseases associated with this enzyme include 
MPSrVA and MSD. The natural substrate of gliicosamine-6-sulfatase is heparan sulfate 
30 and keratan sulfate. A disease associated with this enzyme is MPSIIID and MSD. The 
natural substrate ofglucuronate-2-sulfatase is heparan sulfirte. The natural substrate of 
glucosamine-3-sulfatase is heparan sulfide. 
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Sulfatases are activated through conversion of a cysteine residue as described 
above. The conversion is reo^ for catalytic activ^ Itis 
likely that all sulf 
was shown tod 

5 (ARSB). It has been shown that the modified residue and a metal ion are located at the 
base of a substrate binding pocket 

Nine human sulfatase genes are known and murine rat, goat, or avian orthologs 
for some of these have been identified. A high degree of similarity occurs particularly in 
the amino terminal region which contains accordingly a potential consensus sulfatase 



Sulfatases, as discussed above, are associated with human disease. Most 
sulfatase deficiencies cause lysosomal storage disorders. The mucopolysaccharidoses 



deformities, hepatosplenomegaly, and deformities of soft tissues caused by deficiencies 
of sulfatases acting on glycosaminoglycans. m metachromatic leukodystrophy, a 
deficiency of ARSA causes the storage of sulfolipids in the central andperipheral 
nervcussystems, leading to neurologic deterioration. X-linked icythyosis is caused by 
STS deficiency leading to increased cholesterol sulfate levels. MSD, a disorder in which 
all sulfatase activities are simultaneously defective, was shown to result from adefect in 
the co- or post-translational processing of sulfatases. 

Accordingly, sulfatases are a major target for drug action and development 
Therefore, it is valuable to the field of pharmaceutical development to identify and 
characterize previously unknown sulfatases. The present invention advances the state of 
the art by providing previously unidentified human sulfatases. 

SUMMARY OF THE INVENTION 

Novel sulfatase nucleotide sequences, and the deduced sulfatase polypeptides 
are described herein. Accordingly, the invention provides isolated sulfatase nucleic acid 
molecules having the sequences shown in SEQ ID NOS :2, 4, 6, and 8 or in the cDNA 

deposited with ATGC as Patent Deposit Number , PTA-1639, PTA-1846, or 

, respectively ("the deposited cDNA"), and variants and fragments thereof. 
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It is also an object of the invention to provide nucleic acid molecules encoding 
the sulfatase polypeptides, and variants and fragments thereof. Such nucleic acid 
molecules are useful as targets and reagents in sulfatase expression assays, are applicable 
to treatment and diagnosis of sulfatase-related disorders and are useful for producing 
5 novel sulfatase polypeptides by recombinant methods. 

The invention thus further provides nucleic acid constructs comprising the 
nucleic acid molecules described herein. In a preferred embodiment, the nucleic acid 
molecules of the invention are operatively linked to a regulatory sequence. The 
invention also provides vectors and host cells for expressing the sulfatase nucleic acid 
1 0 molecules and polypeptides, and particularly recombinant vectors and host cells. 

In another aspect, it is an object of the invention to provide isolated sulfatase 
polypeptides and fragments and variants thereof, including a polypeptide having the 
amino acid sequence shown in SEQ ID NOS: 1, 3, 5 or 7 or the amino acid sequences 
encoded by the deposited cDNAs. The disclosed sulfatase polypeptides are useful as 
15 reagents or targets in sulfatase assays and are applicable to treatment and diagnosis of 
sulfatase-related disorders. 

The invention also provides assays for deterniining the activity of or the presence 
or absence of the sulfatase polypeptides or nucleic acid molecules in a biological sample, 
including for disease diagnosis. In addition, me invention provides assays for 
20 determining the presence of a mutation in the polypeptides or nucleic acid molecules, 
including for disease diagnosis. 

A further object of the invention is to provide compounds that modulate 
expression of the sulfatase for treatment and diagnosis of sulfatase-related disorders. 
Such compounds may be used to treat conditions related to aberrant activity or 
25 expression of the sulfatase polypeptides or nucleic acids. 

The disclosed invention further relates to methods and compositions for the 
study, modulation, diagnosis and treatment of sulfatase related disorders. The 
compositions include sulfatase polypeptides, nucleic acids, vectors, transformed cells 
and related variants thereof. In particular, the invention relates to the diagnosis and 
30 treatment of sulfatase-related disorders including, but not limited to disorders as 

described in the background above, further herein, or involving a tissue shown in the 
figures herein. 
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In yet another aspect, the invention provides antibodies or antigen-binding 
fragments thereof that selectively bind the sulfatase polypeptides and fragments. Such 
antibodies and antigen binding fragments have use in die. detection of the sulfatase 
polypeptide, and in the prevention, diagnosis and treatment of sulfatase related disorders. 
5 The sulfatases disclosed herein are designated as follows: 22438, 23553, 25278, 

and 26212. 

DESCRIPTION OF THE DRAWINGS 

Figure 1 shows the 22438 sulfatase cDNA sequence (SEQ ID NO:2) and the 
deduced amino acid sequence (SEQ ID NO:l). The 22438 sulfatase coding sequence is 
set forth in SEQ ID NO:ll. 

Figure 2 shows a 22438 sulfatase hydrophobicity plot. Relative hydrophobic 
residues are shown above the dashed horizontal line, and relative hydrophilic residues 
are below the dashed horizontal line. The cysteine residues (cys) and N giycosylation 
site (Ngly) are indicated by short vertical lines just below the hydropathy trace. The 
numbers corresponding to the amino aeid sequence (shown in SEQ ID NO:l) of 
22438 sulfatase are indicated. Polypeptides of the invention include fragments which 
include: all or a part of a hydrophobic sequence (a sequence above the dashed line); or 
all or part of a hydrophilic fragment (a sequence below the dashed line). Other 
fragments include a cysteine residue or as N-glycosylation site. 

Figure 3 shows an analysis of the 22438 sulfatase amino acid sequence: apturn 
and coil regions; hydrophilicity; amphipathic regions; flexible regions; antigenic 
index; and surface probability plot. 

Figure 4 shows an analysis of the 22438 sulfatase open reading frame for 
amino acids corresponding to specific functional sites. For the N-glycosylation sites, 
the actual modified residue is the first amino acid. For cAMP- and cGMP-dependent 
protein kinase phosphorylation sites, the actual modified residue is the last amino 
acid. For protein kinase C phosphorylation sites, the actual modified residue is the 
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first amino acid. For casein kinase II phosphorylation sites, the actual modified 
residue is the first amino aeid. For N-myristoylation sites, the actual modified residue 
is the first amino acid. In addition, an amidation site is found from about amino acids 
56-59, an EGF-like domain cysteine pattern signature found from about amino acids 

5 260-271 , and a sulfatase signature is found from about amino acids 129-138. 

Figure 5 shows the 23553 sulfatase cDNA sequence (SEQ ID NO:4) and the 
deduced arnino acid sequence (SEQ ID NO:3). The 23553 sulfatase coding sequence is 
set forth in SEQ ID NO: 12. 

Figure 6 shows a 23553 sulfatase hydrophobicity plot Relative hydrophobic 
residues are shown above the dashed horizontal line, and relative hydrophilic residues 
are below the dashed horizontal line. The cysteine residues (cys) and N glycosylation 
site (Ngly) are indicated by short vertical lines just below the hydropathy trace. The 
numbers corresponding to the arnino acid sequence (shown in SEQ ID NO :3) of 
23553 sulfatase are indicated. Polypeptides of the invention include fragments which 
include: all or a part of a hydrophobic sequence (a sequence above the dashed line); or 
all or part of a hydrophilic fragment (a sequence below the dashed line). Other 
fragments include a cysteine residue or as N-glycosylation site. 

Figure 7 shows an analysis of the 23553 sulfatase amino acid sequence: apturn 
and coil regions; hydrophilicity; amphipathic regions; flexible regions; antigenic 
index; and surface probability plot. 

25 Figure 8 shows an analysis of the 23553 sulfatase open reading frame for 

amino acids corresponding to specific functional sites. For the N-glycosylation sites, 
the actual modified residue is the first amino acid. For protein kinase C 
phosphorylation sites, the actual modified residue is the first amino acid. For casein 
kinase II phosphorylation sites, the actual modified residue is the first amino acid. 

30 For the tyrosine kinase phosphorylation site, the actual modified residue is the last 
amino acid residue. For N-myristoylation sites, the actual modified residue is the first 
amino acid. In addition, a sulfatase signature is found from about amino acids 85-97. 
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Figure 9 shows relative expression of the 23553 sulfatase mRNA in normal and 
cancerous human tissues. 

5 Figure 10 shows the 25278 sul&tase cDNA sequence (SEQ ID NO:6) and the 

deduced amino acid sequence (SEQ ID NO:5). The 25278 sulfatase coding sequence is 
setforthinSEQIDNO:13. 

Figure 1 1 shows a 25278 sulfatase hydrophobicity plot. Relative hydrophobic 
1 0 residues are shown above the dashed horizontal line, and relative hydrophilic residues 
are below the dashed horizontal line. The cysteine residues (cys) and N glycosylate 
site (Ngly) are indicated by short vertical lines just below the hydropathy trace. The 
numbers corresponding to the amino acid sequence (shown in SEQ ID NO:5) of 
25278 sulfatase are indicated . Polypeptides of the invention include fragments which 
15 include: all or a part of a hydrophobic sequence (a sequence above the dashed line); or 
all or part of a hydrophilic fragment (a sequence below the dashed line). Other 
fragments include a cysteine residue or asN-glycosylation site. 

Figure 12 shows an analysis of the 25278 sulfatase amino acid sequence: 
20 apturn and coil regions; hydrophilicity; amphipathic regions; flexible regions; 
antigenic index; and surface probability plot 

Figure 13 shows an analysis of the 25278 sulfatase open reading frame for 
amino acids corresponding to specific functional sites. For the N-glycosylation sites, 

25 the actual modified residue is the first amino acid. For cAMP- and cGMP-dependent 
protein kinase phosphorylation sites, the actual modified residue is the last amino 
acid. For protein kinase C phosphorylation sites, the actual modified residue is the 
first amino acid. For casein kinase 31 phosphorylation sites, the actual modified 
residue is the first amino acid. For the tyrosine kinase phosphorylation site, the actual 

30 modified residue is the last amino acid residue. For N-myristoylation sites, the actual 
modified residue is the first amino acid. In addition, amidation sites are found from 
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about amino acids 312-315 and 541-544, and sulfatase signatures are found from 
about amino acids 139-148 and 91-103. 



Figure 14 shows relative expression of 25278 sulfatase mRNA in normal and 
5 cancerous human tissues. , 

Figure 1 5 shows the 26212 sulfatase cDNA sequence (SEQ ID NO:8) and the 
deduced amino acid sequence (SEQ ID NO:7). The 26212 sulfatase coding sequence is 
set forth in SEQ ID NO:14. 

10 

Figure 1 6 shows a 262 12 sulfatase hydrophobicity plot. Relative hydrophobic 
residues are shown above the dashed horizontal line, and relative hydrophilic residues 
are below the dashed horizontal line. The cysteine residues (cys) and N glycosylation 
site (Ngly) are indicated by short vertical lines just below the hydropathy trace. The 
15 numbers corresponding to the amino acid sequence (shown in SEQ ID NO:7) of 

26212 sulfatase are indicated. Polypeptides of the invention include fragments which 
include: all or apart of a hydrophobic sequence (a sequence above the dashed line); or 
all or part of a hydrophilic fragment (a sequence below the dashed line). Other 
fragments include a cysteine residue or as N-glycosylation site. 

20 

Figure 17 shows an analysis of the 26212 sulfatase amino acid sequence: 
upturn and coil regions; hydrophilicity; amphipathic regions; flexible regions; 
antigenic index; and surface probability plot, 

25 Figure 1 8 shows an analysis of the 26212 sulfatase open reading frame for 

amino acids corresponding to specific functional sites. For the N-glycosylation sites, 
the actual modified residue is the first amino acid. For cAMP- and cGMP-dependent 
protein kinase phosphorylation sites, the actual modified residue is the last amino 
acid. For protein kinase C phosphorylation sites, the actual modified residue is the 

30 first amino acid. For casein kinase II phosphorylation sites, the actual modified 

residue is the first amino acid. For the tyrosine kinase phosphorylation site, the actual 
modified residue is the last amino acid residue. For N-myristoylation sites, the actual 



-10- 



WO 01/5541 1 PCT7US01/03266 

modified residue is the first amino acid. In addition, sulfatase signature sites are 
found from about amino acids 168-177 and 120-132. 



Figure 19 depicts an alignment of the 22438 sulfatase domain with a 
5 consensus amino acid sequence derived from a hidden Markov model. The upper 
sequence is the consensus amino acid sequence (SEQ ID NO:9), while the lower 
amino acid sequence corresponds to amino acids 36 to 462 of SEQ ID NO:l . 

Figure 20 depicts an alignment of the 23553 sulfatase domain with a 
1 0 consensus amino acid sequence derived from a hidden Markov model. The upper 
sequence is the consensus amino acid sequence (SEQ ID NO:9), while the lower 
amino acid sequence corresponds to amino acids 43 to 467 of SEQ ID NO:3. 

Figure 21 shows the expression of 23553 in the following human carcinoma 
15 cell lines: breast cancer cell lines MCF-7, ZR75, T47D, MDA23 1 , and MDA435; 
colon cancer cell lines DLD-1, SW480, SW620, HCT1 16, HT29, and Colo205; lung 
cancer cell lines NCIH125, NCIH69, NCIH322, NCIH460, and A549. Expression 
levels were determined by reverse transcriptase(RT) quantitative PCR (Taqman® 
brand quantitative PCR kit, Applied Biosystems). The quantitative PCR reactions 
20 were performed according to the kit manufacturer's instructions. 

Figure 22 shows the expression of 23553 in clinical samples of normal human 
breast tissue and the following human breast tumor tissues: ductal in situ carcinoma 
(DCIS), invasive ductal carcinoma (IDC), and invasive lobular carcinoma (ILC). 
25 Expression levels were determined as described in the description of Figure 2 1 . 

Figure 23 shows the expression of 23553 in human clinical samples of normal 
colon, colon tumor; metastatic liver, and normal liver tissue. Expression levels were 
determined as described in the description of Figure 21, 

30 
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Figure 24 shows the expression of 23553 in normal human lung and 
adenocarcinoma (AC) and squamous cell carcinoma (SCC) lung tumor tissue. 
Expression levels were determined as described in the description of Figure 21. 

5 Figure 25 shows the expression of 23553 in the following normal human 

tissues: prostate (column I), liver (columns 2 and 3), breast (columns 4 and 5), 
skeletal muscle (column 6), brain (columns 7 and 8), colon (columns 9 and 10), heart 
(columns 1 1 and 12), ovary (columns 13 and 14), kidney (columns 15 and 16), lung 
(columns 17 and 1 8), vein (columns 19 and 20), trachea (column 21), adipose 
10 (columns 22 and 23), small intestine (column 24), thyroid (columns 25 and 26), skin 
(columns 27 and 28), testes (column 29), placenta (column 30), fetal liver (columns 
31 and 32), fetal heart (columns 33 and 34), osteoblasts (undifferentiated, column 35 
and primary culture, column 36), fetal spinal cord (column 38), cervix (column 39), 
spleen (column 40), spinal cord (column 41), thymus (column 42), tonsil (column 43), 
15 " lymph node (column 44), and aorta (column 45). 23553 was expressed at high levels ' 
in trachea, vein, osteoblast, kidney, and testes tissue; significant expression of 23553 
was noted in adipose, colon, skeletal muscle, thyroid, and prostate tissues. Expression 
levels were determined as described in the description of Figure 21. 

20 Figure 26 shows the expression of 23553 in the following human tissues: 

normal brain (column 1), glioblastoma (columns 2-5), normal breast (column 6), 
breast tumor (columns 7-9), normal colon (column 10), colon tumor (columns 1 1-13), 
normal liver (column 14), metastatic colon (columns 15 and 16), normal lung (column 
17), lung tumor (columns 18-20), placenta (column 21), fetal adrenal gland (column 

25 22), normal skin (columns 23 and 24), and adipose (column 25). 23553 was 

detectable in all tissues tested, with evidence of increased expression levels in breast, 
colon, and lung tumors. In addition, 23553 was expressed at an elevated level in 
glioblastoma tissue, as compared to normal brain tissue. Expression levels were 
determined as described in the description of Figure 21. 

30 

Figure 27 depicts an alignment of the 25278 sulfatase domain with a 
consensus amino acid sequence derived from a hidden Markov model. The upper 
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sequence is the consensus amino acid sequence (SEQ ID NO:9), while the lower 
amino acid sequence corresponds to amino acids 47 to 471 of SEQ ID NO:5. 

Figure 28 shows the relative expression of 25278 in various human tissues, as 
5 follows. Row 1,NDR 19, breast, DCIS (ductal in situ carcinoma); Row 2, MDA 
138, breast, normal; Row 3, NDR 01, breast, IDC (invasive ductal carcinoma); Row 
4, NDR 15, breast, DC (ductal carcinoma); Row 5, NDR 133, breast, ILC (invasive 
lobular carcinoma); Row 6, MDA 161, breast, IDC; Row 7, MDA 155, breast, 
IDC/DCIS; Row 8, PIT 270, lung, normal; Row 9, CHT 427, lung, normal; Row 1 0, 
10 PIT 241, lung, normal; Row 1 1, PIT 298, lung, normal; Row 12, CHT 800, lung, AC 
(adenocarcinoma); Row 13, CHT 335, lung, SCC (squamous cell carcinoma); Row 
14, CHT447, lung, AC; Row 15, CHT 752, lung, AC; Row 1.6, CHT 799, lung, AC; 
Row 17, CHT 369, lung, SCC; Row 18, CHT 369, lung, SCC; Row 19, CHT 371, 
colon, normal; Row 20, CHT 396, colon, normal; Row 21, CHT 398, colon, normal; 
15 Row 22, NDR 104, colon, normal; Row 23, CHT 520, colon, adenocarcinoma; Row 
24, CHT 122, colon, adenocarcinoma; Row 25, CHT 536, colon, adenocarcinoma; 
Row 26, CHT 528, colon, adenocarcinoma; Row 27, CHT 386, colon, 
adenocarcinoma; Row 28, CHT 372, colon, adenocarcinoma; Row 29, CHT 532, 
colon, adenocarcinoma; Row 30, CHT 77, liver, metastatic; Row31, CHT 321, liver, 
20 metastatic; Row 32, CHT 84, liver, metastatic; Row 33, NDR 100, liver, metastatic; ' 
Row 34, NDR 154, liver, normal; Row35, CHT 322, liver, normal; Row 36, PIT 51, 
liver, normal; Row 37, CHT 339, liver, normal; Row 38, PIT 265, breast, normal; 
Row 39, MDA 335, breast, normal; Row 40, NDR 132, breast, DCIS; Row 41, NDR 
13, breast, normal; Row 42, NDR 56, breast, normal. 

25 

Figure 29 depicts an alignment of the 26212 sulfatase domain with a 
consensus amino acid sequence derived from a hidden Markov model The upper 
sequence is the consensus amino acid sequence (SEQ ID NO: 10), while die lower 
amino acid sequence corresponds to amino acids 76 to 502 of SEQ ID NO:7. 

30 

Figure 30 shows the expression of 26212 in various human endothelial cells, as 
Mows. Proliferating human umbilical vein endothelial cells (HUVEC) (column 1); 
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arresting HUVEC (column 2); HUVEC minus growth factor (column 3); proliferating 
cardiac human microvascular endothelial cells (HMVEC) (columns 4 and 6); arresting 
cardiac HMVEC (columns 5 and 7); proliferating lung HMVEC (columns 8, 1 1, and 
13); arresting lung HMVEC (columns 9, 12, and 14); and lung HMVEC minus growth 
5 factor (columns 10 and 1 5); HEK 293 (non-endothelial) cells (column 1 6). In six of six 
independent experiments, 26212 is up-regulated in proliferating endothelial cells as 
compared to arrested endothelial cells. Further, 26212 expression levels are higher in 
proliferating endothelial cells than in HEK 293 (non-endothelial) cells. Expression 
levels were determined as described in the description of Figure 21. 

10 

Figure 31 shows the expression of 26212 in the following human tissues. Figure 
31 A: normal breast (columns 1 and 2), breast tumor (columns 3-9), normal ovary 
(columns 10 and 1 1), ovary tumor (columns 12-19), normal lung (columns 20-23), lung 
tumor (columns 24-31). Figure 31B: normal colon (columns 1-4), colon tumor (columns 

15 5-12), liver metastases (columns 13-16), normal liver (columns 17-18), normal brain 
(coiumnsl9-20), astrocyte (column 21), brain tumor (columns 22-25), arresting human 
microvascular endothelial cells (column 26), proliferating human microvascular 
endothelial cells (column 27), placenta (column 28), fetal adrenal tissue (columns 29- 
30), and fetal liver (column 31). Expression levels were determined as described in 

20 the description of Figure 21 . 

Figure 32 shows 26212 expression in normal human clinical breast samples 
(columns 1 and 2) and human clinical breast tumor samples (columns 3-9). Expression 
levels were determined as described in the description of Figure 21. 

25 

Figure 33 shows 26212 expression in normal human clinical lung samples 
(columns 1 -4) and human clinical lung tumor samples (columns 5-12). Expression 
levels were determined as described in the description of Figure 21. 

30 Figure 34 shows the temporal expression of 26212 in human normal and breast 

cancer epithelial cell lines (MCF10A and MCF3B, respectively) after treatment with 
epidermal growth factor (EOF). MCF10A cells are shown 0, 0.5, 1, 2, 4, and 8 hours 
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after treatment with EGF (columns 1-6, respectively). Similarly, MCF3B cells are 
shown 0, 0.5, 1, 2, 4, and 8 hours after treatoe^ wim EGF (columns 7-12, respectively). 
26212 is up-regulated in both cell lines. Expression levels were determined as 
described in the description of Figure 21. 

Figure 35 shows expression of 26212 in human hemangiomas and other 
angiogenic tissues: hemangioma (ONC 101; column 1); hemangioma (ONC 102; 
column 2); hemangioma (ONC 103; column 3); skin (NDR 295; column 4); fetal heart 
(BWH4; column 5); normal heart (MPI 849; column 6); spinal cord (CKN 746; column 
7); uterine adenocarcinoma (CHT 1424; column 8); and endometrial polyps (CLN 944; 
column 9). Expression levels were determined as described in the description of 
Figure 21. 

Figure 36 shows expression of 26212 in the following human tissues: normal 
artery (column 1), normal vein (column 2), aortic smooth muscle cells (SMC), early 
(column 3), coronary SMC (column 4), static human umbilical vein endothelial cells 
(HUVEC) (column 5), shear HUVEC (column 6), normal heart (column 7), heart, 
congestive heart failure (CHF) (column 8), kidney (column 9), skeletal muscle (column 
10), normal adipose (column 1 1), pancreas (column 12), primary osteoblasts (column 
13), osteoclasts, differentiated (column 14), normal skin (column 15), normal spinal cord 
(column 16), normal brain cortex (column 1 7), normal brain hypothalamus (column 1 8), 
nerve (column 19), dorsal root ganglion (DRG) (column 20), glial cells (astrocytes) 
(column 21), glioblastoma (column 22), normal breast (column 23), breast rumor 
(column 24), normal ovary (column 25), ovary tumor (column 26), normal prostate 
(column 27), prostate tumor (column 28), prostate epithelial cells (column 29), normal 
colon (column 30), colon tumor (column 3 1), normal lung (column 32), lung tumor 
(column 33), lung, chronic obstructive pulmonary disease (COPD) (column 34), colon, 
inflammatory bowel disease (EBD) (column 35), normal liver (column 36), liver fibrosis 
(column 37), dermal cells, fibroblasts (column 38), normal spleen (column 39), normal 
tonsil (column 40), lymph node (column 41), small intestine (column 42), skin, 
decubitus (column 43), synovium (column 44), bone marrow mononuclear cells (BM- 
MNC) (column 45), and activated peripheral blood mononuclear cells (FBMC) (column 
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46). The expression levels of 26212 are higher in / endothelial and glial cells than in other 
tissues and cells. Expression levels were determined as described in the description of 
Figure 21. 

5 DETAILED DESCRIPTION OF THE INVENTION 

Sulfatase Polypeptides 

The invention is based on the identification of the novel human 2243S 
sulfatase. In situ hybridization experiments showed that this sulfatase is expressed in 
10 the following monkey tissues: sub-populations of DRG neurons (mainly in small and 
medium sized neurons), in spinal cord (interneurons and motor neurons), and in the 
brain. The sulfatase is also expressed in human brain. The sulfatase cDNA was 
identified based on consensus motifs or protein domains characteristic of sulfatases 
and, in particular, arylsulfatase. BLAST analysis has shown homology with human 

15 arylsulfatase E, a human iduronate-2-sulfatase, human N-acetylgalactosamine-6- 
sulfatase, murine arylsulfatase A, and human arylsulfatase A. However, some 
homology has also been found with other arylsulfatases from various mammalian 
species, including, but not limited to, human arylsulfatase D, E, F, and B. 

The invention is also based on the identification of the novel human 23553 

20 sulfatase. Taqman analysis has shown positive differential expression in breast and 
colon cancer and in colonic metastases to the liver (Figure 9). This sulfatase has been 
identified as a glucosamine-6-sulfatase based on ProDom matches and BLAST 
analysis. Some homology has also been found to human arylsulfatase A, human N- 
acetylglucosamine-6-sulfatase, and human iduronate-2-sulfatase. 

25 The invention is also based on the identification of the novel human 25278 

sulfetase. The sulfatase is differentially expressed in human colon cancer and in 
colonic metastases to the liver, as determined by Taqman analysis. This sulfatase has 
been identified as a N-acetylgalactosamine-4-sulfatase by ProDom matching and 
BLAST homology alignment Further, based on BLAST analysis, some homology 

30 has also been shown to arylsulfatase B and arylsulfatase A. 

The invention is also based on the identification of the novel human 26212 
sulfatase. This sulfatase has been identified as an arylsulfetase by ProDom matching 



-16- 



WO 01/55411 PCT/US01/03266 
and BLAST sequence alignment. Homology has been shown to arylsulfatase B. 
Some homology has also been found with arylsulfatase F, E, D, and A, as well as with 
iduronate 2 sulfatase. Arylsulfatase B is also known as N-acetyIgalactosamine-4- 
sulfatase. 

5 Specifically, newly-identified human genes, termed 22438, 23553, 25278, and 

26212 sulfatases are provided. These sequences, and other nucleotide sequences 
encoding the sulfatase proteins or fragments and variants thereof, are referred to as 
"22438, 23553, 25278, and 26212 sulfatase sequences." 

Plasmids containing the sulfatase cDNA inserts were deposited with the Patent 
10 Depository of the American Type Culture Collection (ATCC), 1 0801 University 

Boulevard, Manassas, Virginia, on , April 5, 2000, May 9, 2000, or , and 

assigned Patent Deposit Numbers ,PTA-1639, PTA-1846, or .respectively. 

The deposits will be maintained under the terms of the Budapest Treaty on the 
International Recognition of the Deposit of Microorganisms for the Purposes of 
15 Patent Procedure. The deposits were made merely as a convenience for those of skill 
in die art and is not an admission that a deposit is required under 35 U.S.C. §112. 

The sulfatase cDNA was identified in human cDNA libraries. Specifically, 
expressed sequence tags (EST) found in human cDNA libraries, were selected based on 
homology to known sulfatase sequences. Based on such EST sequences, primers were 
20 designed to identify a full length clone from a human cDNA library. Positive clones 
were sequenced and the overlapping fragments were assembled. The 22438, 23553, 
25278, and 26212 sulfatase amino acid sequences are shown in Figures 1, 5, 10, and 15, 
respectively, and SEQ ID NOS:l, 3, 5, and 7. The 22438, 23553, 25278, and 26212 
sulfatase cDNA sequences are shown in Figures 1, 5, 10, and 15 and SEQ ID NOS:2, 4, 
6, and 8. 

Analysis of the assembled sequences revealed that the cloned cDNA 
molecules encoded sulfatase-like polypeptides. BLAST analysis indicated that the 
23553 sulfatase is a glucosamine-6-sulfatase, that the 25278 sulfatase is an N- 
acetylgalactosamine-4-sulfatase, that the 22438 is an arylsulfatase with highest 
homology to arylsulfatase A and E genes and that the 26212 sulfatase is an 
arylsulfatase with highest homology to the arylsulfatase B gene (N- 
acetyfgalactosamine-4-sulfatase). 
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The sulfatase sequences of the invention belong to the sulfatase family of 
molecules having conserved functional features. The term "family" when referring to 
the proteins and nucleic acid molecules of the invention is intended to mean two or 
more proteins or nucleic acid molecules having sufficient amino acid or nucleotide 
5 sequence identity as defined herein to provide a specific function. Such family 
members can be naturally-occurring and can be from either the same or different 
species. For example, a family can contain a first protein of murine origin and an 
ortholog of that protein of human origin, as well as a second, distinct protein of 
human origin and a murine ortholog of that protein. 

The 22438 sulfatase gene encodes an approximately 2175 nucleotide mKNA 
transcript having the corresponding cDNA set forth in SEQ ID NO:2. This transcript has 
an open reading frame which encodes a 525 amino acid protein (SEQ ID NO: 1). 

The 23553 sulfatase gene encodes an approximately 4321 nucleotide mKNA 
transcript having the corresponding cDNA set forth in SEQ ID NO:4. This transcript has 
an open reading frame which encodes an 871 amino acid protein (SEQ ID NG:3). 

The 25278 sulfatase gene encodes an approximately 2940 nucleotide mKNA 
transcript having the corresponding cDNA set forth in SEQ ID NO:6, This transcript has 
an open reading frame which encodes a 569 amino acid protein (SEQ ID NO:5). 

The 26212 sulfatase gene encodes an approximately 2253 nucleotide mRNA 
transcript having the corresponding cDNA set forth in SEQ ID NO:8. This transcript has 
an open reading frame which encodes a 599 amino acid protein (SEQ ID NO:7). 

Prosite program analysis was used to predict various sites within the 22438 
sulfatase protein as shown in Figure 4. 

Prosite program analysis was used to predict various sites within the 23553 
sulfatase protein as shown in Figure 8. 

Prosite program analysis was used to predict various sites within the 25278 
sulfatase protein as shown in Figure 13. 

Prosite program analysis was used to predict various sites within the 26212 
sulfatase protein as shown in Figure 1 8. 

In situ hybridization experiments showed that 22438 is expressed in 
subpopulations of DRG neurons, spinal cord, and brain, as disclosed hereinabove. 
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Expression of the 22438 sulfatase rnRNA in the above cells and tissues 
indicates that the sulfatase is likely to be involved in the proper function of and in 
disorders involving these tissues. Accordingly, the disclosed invention further relates 



sulfatase related disorders, especiaUy disorders of these tissues that include, but are 
not limited to those disclosed herein. 

The 23553 sulfatase is differentially expressed in breast and colon cancer and 
in colonic metastases to the liver. Accordingly, the disclosed invention further relates 
to methods and compositions for the study, modulation, diagnosis and treatment in 
these tissues (normal and tumor). 

The 25278 sulfatase is differentially expressed in colon tumors and colonic 
metastases to the liver. Accordingly, the disclosed invention further relates to 
methods and compositions for the study, modulation, diagnosis and treatment in these 



1 5 The 26212 sulfatase is differentially expressed in colon metastases and lung 

tumors. Accordingly, the disclosed invention further relates to methods and 
compositions for the study, modulation, diagnosis and treatment in these normal and 
tumor tissues. 

The compositions include sulfatase polypeptides, nucleic acids, vectors, 
20 transformed cells and related variants and fragments thereof, as well as agents that 
modulate expression of the polypeptides and polynucleotides. In particular, the 
invention relates to the modulation, diagnosis and treatment of sulfatase related 
disorders as described herein. 

Treatment is defined as the application or administration of a therapeutic agent 
25 to a patient, or application or admimstration of a therapeutic agent to an isolated tissue 
or cell line from a patient, who has a disease, a symptom of disease or a predisposition 
toward a disease, with the purpose to cure, heal, alleviate, relieve, alter, remedy, 
ameliorate, improve or affect the disease, the symptoms of disease or the 
predisposition toward disease. "Subject, as used herein, can refer to a mammal, e.g. a 
30 human, or to an experimental or animal or disease model. The subject can also be a 
non-human animal, e.g. a horse, cow, goat, or other domestic animal. A therapeutic 
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agent includes, but is not limited to, small molecules, peptides, antibodies, ribozymes 
and antisense oligonucleotides. 

Disorders involving the brain include, but are not limited to, disorders 
involving neurons, and disorders involving glia, such as astrocytes, oligodendrocytes, 
5 ependymal cells, and microglia; cerebral edema, raised intracranial pressure and 
herniation, and hydrocephalus; malformations and developmental diseases, such as 
neural tube defects, forebrain anomalies, posterior fossa anomalies, and syringomyelia 
and hydromyelia; perinatal brain injury; cerebrovascular diseases, such as those 
related to hypoxia, ischemia, and infarction, including hypotension, hypoperfusion, 
1 0 and low-flow states-global cerebral ischemia and focal cerebral ischemia-infarction 
from obstruction of local blood supply, intracranial hemorrhage, including 
intracerebral (intraparenchymal) hemorrhage, subarachnoid hemorrhage and ruptured 
berry aneurysms, and vascular malformations, hypertensive cerebrovascular disease, 
including lacunar infarcts, slit hemorrhages, and hypertensive encephalopathy; 
infections, such as acute memngitis, including acute pyogenic (bacterial) meningitis 
and acute aseptic (viral) meningitis, acute focal suppurative infections, including brain 
abscess, subdural empyema, and extradural abscess, chronic bacterial 
meningoencephalitis, including tuberculosis and mycobacterioses, neurosyphihs, and 
neuroborreliosis (Lyme disease), viral meningoencephalitis, including arthropod- 
borne (Arbo) viral encephalitis, Herpes simplex virus Type 1, Herpes simplex virus 
Type 2, Varicalla-zoster virus (Herpes zoster), cytomegalovirus, poliomyelitis, rabies, 
and human immunodeficiency virus 1, including fflV-1 meningoencephalitis 
(subacute encephalitis), vacuolar myelopathy, AIDS-associated myopathy, peripheral 
neuropathy, and AIDS in children, progressive multifocal leukoencephalopathy, 
subacute sclerosing panencephalitis, fungal meningoencephalitis, other infectious 
diseases of the nervous system; transmissible spongiform encephalopathies (prion 
diseases); demyelinating diseases, including multiple sclerosis, multiple sclerosis 
variants, acute msseminated encephalomyelitis and acute necrotizing hemorrhagic 
encephalomyelitis, and other diseases with derayelination; degenerative diseases, such 
as degenerative diseases affecting the cerebral cortex, including Alzheimer disease 
and Pick disease, degenerative diseases of basal ganglia and brain stem, including 
Parkinsonism, idiopathic Parkinson disease (paralysis agitans), progressive 
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supranuclear palsy, corticobasal degeneration, multiple system atrophy, including 
striatonigral degeneration, Shy-Drager syndrome, and olivopontocerebellar atrophy, 
and Huntington disease; spinocerebellar degenerations, including spinocerebellar 
ataxias, including Friedreich ataxia, and ataxia-telangiectasia, degenerative diseases • 
5 affecting motor neurons, including amyotrophic lateral sclerosis (motor neuron 
disease), bulbospinal atrophy (Kennedy syndrome), and spinal muscular atrophy; 
inborn errors of metabolism, such as leukodystrophies, including Krabbe disease, 
metachromatic leukodystrophy, adrenoleukodystrophy, Pelizaeus-Merzbacher 
disease, and Canavan disease, mitochondrial encephalomyopathies, including Leigh 
1 0 disease and other mitochondrial encephalomyopathies; toxic and acquired metabolic 
diseases, including vitamin deficiencies such as miamine (vitamin Bi) deficiency and 
vitamin B !2 deficiency, neurologic sequelae of metabolic disturbances, including 
hypoglycemia, hyperglycemia, and hepatic encephalopathy, toxic disorders, including 
carbon monoxide, methanol, ethanol, and radiation, including combined methotrexate 
and radiation-induced injury; tumors, such as gliomas, including astrocytoma, 
including fibrillary (diffuse) astrocytoma and glioblastoma multiforme, pilocyue 
astrocytoma, pleomorphic xanthoastrocytorna, and brain stem glioma, 
oligodendroglioma, and ependymoma and related paraventricular mass lesions, 
neuronal tumors, poorly differentiated neoplasms, including medulloblastoma, other 
parenchymal tumors, including primary brain lymphoma, germ cell tumors, and 
pineal parenchymal tumors, meningiomas, metastatic tumors, paraneoplastic 
syndromes, peripheral nerve sheath tumors, including schwannoma, neurofibroma, 
and malignant peripheral nerve sheath tumor (malignant schwannoma), and 
neurocutaneous syndromes (phakomatoses), including neurofibromotosis, including 
Type 1 neurofibromatosis (NF1) and TYPE 2 neurofibromatosis (NF2), tuberous 
sclerosis, and Von Hippel-Lindau disease. 

Furthermore, as disclosed in the background hereinabove, specific disorders 
have been associated with function of the various sulfatases. Accordingly, the 
sulfatases disclosed herein, having homology to specific sulfatases as disclosed 
herein, are useful for diagnosis and treatment of the disorders associated with 
sulfatase dysfunction as disclosed herein and to modulation of gene expression in the 
affected tissues. 
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The sequences of the invention find use in diagnosis of disorders involving an 
increase or decrease in sulfatase expression relative to normal expression, such as a 
proliferative disorder, a differenuative disorder, or a developmental disorder. The 
sequences also find use in modulating sulfatase-related responses. By "modulating" is 
5 intended the upregulating or downregulating of a response. That is, the compositions 
of the invention affect the targeted activity in either a positive or negative fashion. 

The invention relates to novel sulfatases, having the deduced amino acid 
sequence shown in Figures 1, 5, 10, and 15 (SEQIDNOS:l, 3, 5, and 7) or having the 

amino acid sequences encoded by the deposited cDNAs, Patent Deposit Numbers , 

10 PTA-1639, PTA-1846, or . The deposited sequences, as well as me polypeptides 

encoded by the sequences, are incorporated herein by reference and control in the event 
of any conflict, such as a sequencing error, with description in this application. 

Thus, the present invention provides an isolated or purified sulfatase 
polypeptides and variants and fragments thereof. "Sulfotase polypeptide" or "sulfatase 
1 5 protein" refers to the polypeptide in SEQ ID NOS: 1, 3, 5, or 7 or encoded by the 
deposited cDNAs. The term "sulfatase protein" or "sulfatase polypeptide," however, 
further includes the numerous variants described herein, as well as fragments derived 
from the full-length sulfatase and variants. 

Sulfatase polypeptides can be purified to homogeneity. It is understood, 
20 however, that preparations in which the polypeptide is not purified to homogeneity are 
useful and considered to contain an isolated form of the polypeptide. The critical feature 
is that the preparation allows for the desired function of the polypeptide, even in the 
presence of considerable amounts of other components. Thus, the invention 
encompasses various degrees of purity. 
25 As used herein, a polypeptide is said to be "isolated" or "purified" when it is 

substantially free of cellular material when it is isolated from recombinant and non- 
recombinant cells, or free of chemical precursors or other chemicals when it is 
chemically synthesized. A polypeptide, however, can be joined to another polypeptide 
with which it is not normally associated in a cell and still be considered "isolated" or 
30 "purified." 

In one embodiment, the language "substantially free of cellular material" 
includes preparations of sulfatase having less than about 30% (by dry weight) other 
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proteins (i.e., contaminating protein), less than about 20% other proteins, less than about 
10% other proteins, or less than about 5% other proteins. When the polypeptide is 
recombinantly produced, it can also be substantially free of culture medium, i.e., culture 
medium represents less than about 20%, less than about 1 0%, or less than about 5% of 
5 the volume of the protein preparation. 

The sulfatase polypeptide is also considered to be isolated when it is part of a 
membrane preparation or is purified and then reconstituted with membrane vesicles or 
liposomes. 

The language "substantially free of chemical precursors or other chemicals" 

1 0 includes preparations of the sulfatase polypeptide in which it is separated from chemical 
precursors or other chemicals that are involved in its synthesis. The language 
"substantially free of chemical precursors or other chemicals" includes, but is not limited 
to, preparations of the polypeptide having less than about 30% (by dry weight) chemical 
precursors or other chemicals, less than about 20% chemical precursors or other 

15 chemicals, less than about 1 0% chemical precursors or other chemicals, or less than 
about 5% chemical precursors or other chemieals. 

In one embodiment, the sulfatase polypeptide comprises the amino acid sequence 
shown in SEQ ID NOS:l, 3, 5, or 7. However, the invention also encompasses sequence 
variants. By "variants" is intended proteins or polypeptides having an amino acid 

20 sequence that is at least about 45%, 55%, 65%, preferably about 75%, 85%, 95%, or 
98% identical to the amino acid sequence of SEQ ID NOS:l,3,5, or 7. Variants also 
include polypeptides encoded by the cDNA insert of the plasmid deposited with 

ATCC as Patent Deposit Numbers , PTA-1639, PTA-1846, or , or 

polypeptides encoded by a nucleic acid molecule that hybridizes to the nucleic acid 

25 molecule of SEQ ID NOS:2, 4, 6, 8, 11, 12, 13, or 14, or a complement thereof, under 
stringent conditions. In another embodiment, a variant of an isolated polypeptide of 
the present invention differs, by at least 1, but less than 5, 10, 20, 50, or 100 amino 
acid residues from the sequence shown in SEQ ID NO:l, 3, 5, or 7. If alignment is 
needed for this comparison the sequences should be aligned for maximum identity. 

3 0 "Looped" out sequences from deletions or insertions, or mismatches, are considered 
difierences. Such variants generally retain the functional activity of the 22438-like, 
23553-like, 25278-like, or 262 12-like proteins of the invention. Variants include 
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polypeptides that differ in amino acid sequence due to natural allelic variation or 
mutagenesis. 

Variants include a substantially homologous protein encoded by the same genetic 
locus in an organism, i.e., an allelic variant Variants also encompass proteins derived 
5 from other genetic loci in an organism, but having substantial homology to the sulfatase 
of SEQ ED NOS: 1 , 3, 5, or 7. Variants also include proteins substantially homologous to 
the sulfatase but derived from another organism, ie., an ortholog. Variants also include 
proteins that are substantially homologous to the sulfatase that are produced by chemical 
synthesis. Variants also include proteins that are substantially homologous to the 

10 sulfatase that are produced by recombinant methods. Variants retain the biological 
activity (for example, sulfatase activity) of the polypeptide set forth by the reference 
sequence (SEQ ID NOS: 1, 3, 5, or 7). It is understood, however, that variants exclude 
any amino acid sequences disclosed prior to the invention. 

Preferred sulfatase polypeptides of the present invention have an amino acid 

15 sequence sufficiently identical to the amino acid sequence of SEQ ID NOS:l, 3, 5, or 
7. The term "sufficiently identical" is used herein to refer to a first amino acid or 
^nucleotide sequence that contains a sufficient or minimum number of identical of 
equivalent (e.g., with a similar side chain) amino acid residues or nucleotides to a 
second amino acid or nucleotide sequence such that the first and second amino acid or 

20 nucleotide sequences have a common structural domain and/or common functional 
activity. For example, amino acid or nucleotide sequences that contain a common 
structural domain having at least about 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 
96%, 97%, 98% or 99% identity are defined herein as sufficiently identical. 
In one embodiment, a variant of the 23553 sulfatase is greater than 92% 

25 homologous. In another embodiment, a variant of the 25278 sulfatase is greater than 
50% identical. In another embodiment, the 26212 sulfatase is greater than 50% 
identical. 

To determine the percent identity of two amino acid sequences, or of two 
nucleic acid sequences, the sequences are aligned for optimal comparison purposes 
30 (e.g., gaps can be introduced in one or both of a first and a second amino acid or 
nucleic acid sequence for optimal alignment and non-homologous sequences can be 
disregarded for comparison purposes). In a preferred embodiment, the length of a 
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reference sequence aligned for comparison purposes is at least 30%, preferably at 
least 40%, more preferably at least 50%, even more preferably at least 60%, and even 
more preferably at least 70%, 80%, 90%, 1 00% of the length of the reference 
sequence. The amino acid residues or nucleotides at corresponding amino acid 
5 positionsornucleotidepositionsarethencompared. When a position in the first 
sequence is occupied by the same ammo acid residue or nucleotide as the 
corresponding position in the second sequence, then the molecules are identical at that 
position (as used herein amino acid or nucleic acid "identity" is equivalent to amino 
acid or nucleic acid "homology"). The percent identity between the two sequences is 
10 a function of the number of identical positions shared by the sequences, taking into 
account the number of gaps, and the length of each gap, which need to be introduced 
for optimal alignment of the two sequences. 

The comparison of sequences and determination of percent identity between 
two sequences can be accomplished using a mathematical algorithm. In a preferred 
15 embodiment, the percent identity between two amino acid sequences is determined 
using the Needleman and Wunsch (1 970) J. Mol Biol 4*444-453 algorithm which 
has been incorporated into the GAP program in the GCG software package (available 
at htrp;/Avww.gcg.com), using either a Blossum 62 matrix or a PAM250 matrix, and a 
gap weight of 16, 14, 12, 10, 8, 6, or 4 and a length weight of 1, 2, 3, 4, 5, or 6. In yet 
20 anotherpreferred embodiment, the percent identity between two nucleotide sequences 
is determined using the GAP program in the GCG software package (available at 
http://www.gcg.com), using aNWSgapdnaCMP matrix and a gap weight of 40 50 
60, 70, or 80 and a length weight of 1 , 2, 3, 4, 5, or 6. A particularly preferred set of 
parameters (and the one that should be used if the practitioner is uncertain about what 
25 parameters should be applied to determine if a molecule is within a sequence identity 
or homology Hmitation of the invention) is using a Blossum 62 scoring matrix with a 
gap open penalty of 12, a gap extend penalty of 4, and a frameshift gap penalty of 5. 

The percent identity between two amino acid or nucleotide sequences can be 
determined using the algorithm of E. Meyers and W. Miller (1989) CABIOS *11-I7 
30 which has been incorporated into the ALIGN program (version 2.0), using a PAM120 
weight residue table, a gap length penalty of 12 and a gap penalty of 4. 
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The nucleic acid and protein sequences described herein can be used as a 
"query sequence" to perform a search against public databases to, for example, 
identify other family members or related sequences. Such searches can be performed 
using the NBLAST and XBLAST programs (version 2.0) of Altschul, et el. (1990) J. 
5 Mol Biol. 2/5:403-10. BLAST nucleotide searches can be performed with the 
NBLAST program, score = 100, wordlength = 12 to obtain nucleotide sequences 
homologous to the nucleic acid molecules of the invention. BLAST protein searches 
can be performed with the XBLAST program, score - 50, wordlength = 3 to obtain 
amino acid sequences homologous to the protein molecules of the invention. To 
1 0 obtain gapped alignments for comparison purposes, Gapped BLAST can be utilized as 
described in Altschul et al. (1997) Nucleic Acids Res. 25(1 7) ;33 89-3402. When 
utilizing BLAST and Gapped BLAST programs, the default parameters of the 
respective programs (e.g., XBLAST and NBLAST) can be used. See 
http://ww.ncbi.rnm.nih.gov. 
15 The invention also encompasses polypeptides having a lower degree of 

identity but having sufficient similarity so as to perform one or more of the same 
fractions performed by the sulfatase. Similarity is determined by conservative amino 
acid substitution, as shown in Table 1 . Such substitutions are those that substitute a 
given amino acid in apolypeptide by another amino acid of like characteristics. 
20 Conservative substitutions are likely to be phenotypically silent. Typically seen as 
conservative substitutions are the replacements, one for another, among the aliphatic 
amino acids Ala, Val, Leu, and He; interchange of the hydroxy! residues Ser and Thr, 
exchange of the acidic residues Asp and Glu, substitution between the amide residues 
Asn and Gin, exchange of the basic residues Lys and Arg and replacements among 
25 the aromatic residues Phe, Tyr. Guidance concerning which amino acid changes are 
likely to be phenotypically silent are found in Bowie et al, Science 247: 1306-1 3 1 0 
(1990). 
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Aromatic 


Phenylalaaine 




Tryptophan 




Tyrosine 


Hydrophobic 


Leucine 
Isoleucine 


Polar 


Valine 
Glutamine 


Basic 


Asparagine 
Arginine 




Lysine 
Histidine 


Acidic 


Aspartic Acid 




Glutamic Acid 


Small 


Alanine 




Serine 
Threonine 
Methionine 
Glycine 



A variant polypeptide can differ in amino acid sequence by one or more 
5 substitutions, deletions, insertions, inversions, fusions, and truncations or a 

combination of any of these. Variant polypeptides can be fully functional or can lack 
function in one or more activities. Thus, in the present case, variations can affect the 
function, for example, of one or more of regions including a metal (e.g., Ca* 4 )- 
binding domain, activation domain, sulfatase catalytic domain, the region containing a 
1 0 propeptide, regulatory regions, substrate binding regions, regions involved in 
membrane association or subcellular localization, regions involved in post- 
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translational modification, for example, by phosphorylation, and regions that are 
important for effector function (i.e., agents that act upon the protein, such as in the 
conversion of cysteine to 2-ammo-3-oxoproprionic acid or serine semi-aldehyde). 
Fully functional variants typically contain only conservative variation or 
5 variation in non-critical residues or in non-critical regions. Functional variants can also 
contain substitution of similar amino acids, which results in no change or an insignificant 
change in function. Alternatively, such substitutions may positively or negatively affect 
function to some degree. 

Non-functional variants typically contain one or more non-conservative amino 
1 0 acid substitutions, deletions, insertions, inversions, or truncation or a substitution, 
insertion, inversion, or deletion in a critical residue or critical region. 

As indicated, variants can be naturahy-occurring or can be made by recombinant 
means or chemical synthesis to provide useful and novel characteristics for the sulfatase 
polypeptide. This includes preventing hnmunogenicity from pharmaceutical 
1 5 formulations by preventing protein aggregation. 

Useful variations further include alteration of functional activity. For example, 
one embodiment involves a variation at the substrate binding site that results in binding 
but not hydrolysis or more or less hydrolysis of the substrate than wild type. A further 
useful variation at the same site can result in altered affinity for the substrate. Useful 
20 variations also include changes that provide for affinity for another substrate. Useful 

variations further include the ability to bind an effector molecule with greater or lesser 
affinity, such as not to bind or to bind but not release it. Further useful variations 
include alteration in the ability of the propeptide to be cleaved by a cleavage protein, 
including alteration in the binding or recognition site. Further, the cleavage site can 
25 also be modified so that recognition and cleavage are by a different protease. A 

specific useful variation involves a variation in the ability to be bound or activated by 
the enzyme that activates the sulfatase by the conversion of cysteine to 2-3- 
oxoproprionic acid or serine semi-aldehyde. Further variation could include a 
variation in the specificity of metal binding. 
30 Another useful variation provides a fusion protein in which one or more domains 

or subregions are operationally fused to one or more domains, subregjons, or motifs 
from another sulfatase. For example, a transmembrane domain from a protein can be 
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introduced into the sulfatase such that the protein is anchored in die cell surface. 
Other permutations include changing the number of sulfatase domains, and mixing of 
sulfatase domains from different sulfatase families, so that substrate specificity is 
altered. Mixing these various domains can allow the formation of novel sulfatase 
5 molecules with different host cell subcellular localization, substrate, and effector 
molecule (one that acts on the sulfatase) specificity. 

The term "substrate" is intended to refer not only to the sulfated substrate that 
is cleaved by the sulfatase domain, but to refer to any component with which the 
polypeptide interacts in order to produce an effect on that component or a subsequent 
1 0 biological effect that is a result of interacting with that component This can include, 
but is not limited to, for example, interaction with the sulfatase activation enzyme and 
components involved in the conversion of 3' phosphoadenosine 5' phosphostilfate to 
adenosine 3' 5' Diphosphate. 

Amino acids that are essential for function can be identified by methods known 
15 in the art, such as site-directed mutagenesis or alanine-scanning mutagenesis 

(Cwmingham etal. (1985) Science 24*1081-1085). The latter procedure introduces 
single alanine mutations at every residue in the molecule. The resulting mutant 
molecules are then tested for biological activity, such as peptide bond hydrolysis in vitro 
or related biological activity, such as proliferative activity. Sites that are critical for 
20 binding can also be determined by structural analysis such as crystallization, nuclear 
magnetic resonance or photoaffinity labeling (Smith et al. (1992) J. Mol Biol. 224:899- 
904; de Vos et al. (1992) Science 255:306-312). 

The invention thus also includes polypeptide fragments of the sulfatases. 
Fragments can be derived from the amino acid sequence shown in SEQ ID NOS.T, 3, 5, 
25 or 7. However, the invention also encompasses fragments of the variants of the sulfatase 
polypeptides as described herein. The fragments to which the invention pertains, 
however, are not to be construed as encompassing fragments that may be disclosed prior 
to the present invention. 

A fragment can comprise at least about 10, 1 1, 12, 13, 14, 15, 16, 17, 18, 1 9, 20, 
30 21, 22, 23, 24, 25, 30, 35, 40, 45, 50 or more contiguous amino acids. Fragments can 
retain one or more of the biological activities of the protein, for example as discussed 
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above, as well as fragments that cao beusedas an immunogen to generate sulfatase 
antibodies. 

For example, for the 25278 sulfatase, the invention encompasses amino acid 
fragments greater than 5 amino acids, particularly from regions up to around 
5 nucleotide 450 and beyond around nucleotide 1 520. Specific fragments which may 
be excluded include those that are underlined in Figure 1 . However, even in regions 
between around nucleotide 450 to around nucleotide 1520, fragments include those 
that are five or greater excluding those which may have been disclosed prior to the 
present invention. 

10 Forthe23553 sulfatase, fragments particularly include fragments of 5 amino 

acids or more up to around nucleotide 670. 

For the 26212 sulfatase, for example, fragments containing 5 or more amino 
acids up to about nucleotide 572 are particularly encompassed by the invention. 
However, fragments of 5 amino acids or more encoded by around nucleotide 572 to 

15 around nucleotide 1985 are also encompassed by the invention with the understanding 
that such fragments do not encompass those which may have been disclosed prior to 
the invention. For example, these can include the sections underlined in Figure 15. 

Biologically active fragments (peptides which are, for example, about 5, 10, 
15, 20, 25, 30, 35, 40, 50, 100 or more amino acids in length) can comprise a 

20 functional site. Such sites include but are not limited to those discussed above, such as a 
catalytic site, regulatory site, site important for substrate recognition or binding, regions 
containing a sulfatase domain or motif, phosphorylation sites, glycosylate sites, and 
other functional sites disclosed herein. 

Fragments, for example, can extend in one or both directions from the functional 

25 site to encompass 5, 10, 15, 20, 30, 40, 50, or up to 1 00 amino acids. Further, fragments 
can include sub-fragments of the specific sites or regions disclosed herein, which sub- 
fragments retain the function of the site or region from which they are derived. 

The invention also provides fragments with immunogenic properties. These 
contain an epitope-bearing portion of the sulfatase polypeptide and variants. These 

30 epitope-bearing peptides are useful to raise antibodies that bind specifically to a sulfatase 
polypeptide or region or fragment These peptides can contain at least 10, 12, at least 14, 
or between at least about 15 to about 30 amino acids. The epitope-bearing sulfatase 
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polypeptides may be produced by any conventional means (Hougbten, R.A. (1985) 
Proc. Natl Acad Set USA £2:5131-5135). Simultaneous multiple peptide synthesis is 
described in U.S. Patent No. 4,631,211. 

Non-limiting examples of antigenic polypeptides that can be used to generate 
5 antibodies include but are not limited to peptides derived from extracellular regions. 
Regions having a high antigenicity index are shown in Figures 3,7, 12, and 17. 
However, intracellularly-made antibodies ("intrabodies") are also encompassed, which 
would recognize intracellular peptide regions. 

Fragments can be discrete (not fused to other amino acids or polypeptides) or can 
be within a larger polypeptide. Further, several fragments can be comprised within a 
single larger polypeptide. In one embodiment a fragment designed for expression in a 
host can have heterologous pre- and pro-polypeptide regions fused to the amino terminus 
of the sulfatase polypeptide fragment and an additional region fused to the carboxyl 
tenninus of the fragment 

The invention thus provides chimeric or fusion proteins. These comprise a 
sulfatase peptide sequence operatively linked to a heterologous peptide having an amino 
acid sequence not substantially homologous to the sulfatase polypeptide. "Operatively 
linked" indicates that the sulfatase polypeptide and the heterologous peptide are fused in- 
frame. The heterologous peptide can be fused to the N4erminus or C-terminus of the 
sulfatase polypeptide or can be internally located. 

In one embodiment the fusion protein does not affect sulfatase function per se. 
For example, the fusion protein can be a GST-fusion protein in which sulfatase 
sequences are fused to the N- or C-terminus of the GST sequences. Other types of 
fusion proteins include, but are not limited to, enzymatic fusion proteins, for example 
beta-galactosidase fusions, yeast two-hybrid GAL4 fusions, poly-His fusions and Ig 
fusions. Such fusion proteins, particularly poly-His fusions, can facilitate the 
purification of recombinant sulfatase polypeptide. In certain host cells (e.g., mammalian 
host cells), expression and/or secretion of a protein can be increased by using a 
heterologous signal sequence. Therefore, in another embodiment, the fusion protein 
contains a heterologous signal sequence at its C- or N-terminus. 

EP-A-0 464 533 discloses fusion proteins comprising various portions of 
immunoglobulin constant regions. The Fc is useful in therapy and diagnosis and thus 
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results, for example, in improved pharmacokinetic properties (EP-A 0232 262). In drug 
discovery, for example, human proteins have been fused with Fc portions for the purpose 
of high-throughput screening assays to identify antagonists (Bennett et al. (1995) J. Mol. 
Recog. 5:52-58 (1995) and Johansoa et al. J. BioL Chem. 270:9459-9471). Thus, this 
5 invention also encompasses soluble fusion proteins containing a sulfatase polypeptide 
and various portions of the constant regions of heavy or light chains of immunoglobulins 
of various subclass (IgG, IgM, IgA, IgE). Preferred as immunoglobulin is the constant 
part of the heavy chain of human IgG, particularly IgGl, where fusion takes place at the 
hinge regioa For some uses it is desirable to remove the Fc after the fusion protein has 
1 0 been used for its intended purpose, for example when the fusion protein is to be used as 
antigen for immunizations. In a particular embodiment, the Fc part can be removed in a 
simple way by a cleavage sequence, which is also incorporated and can be cleaved with 
factor Xa. 

A chimeric or fusion protein can be produced by standard recombinant DNA 
15 techniques. Fox example, DNA fragments coding for the different protein sequences are 
ligated together in-frame in accordance with conventional techniques. In another 
embodiment, the fusion gene can be synthesized by conventional techniques including 
automated DNA synthesizers. Alternatively, PGR amplification of gene fragments can 
be carried out using anchor primers which give rise to complementary overhangs 
20 between two consecutive gene fragments which can subsequently be annealed and re- 
amplified to generate a chimeric gene sequence (see Ausubel et al. (1992) Current 
Protocols in Molecular Biology). Moreover, many expression vectors are conimercialiy 
available that already encode a fusion moiety (e.g., a GST protein). A sulfatase- 
encoding nucleic acid can be cloned into such an expression vector such that the fusion 
moiety is linked in-frame to sulfatase. 

Another form of fusion protein is one that directly affects sulfatase functions. 
Accordingly, a sulfatase polypeptide is encompassed by the present invention in which 
one or more of the sulfatase regions (or parts thereof) has been replaced by heterologous 
or homologous regions (or parts thereof) from another sulfatase. Accordingly, various 
permutations are possible, for example, as discussed above. Thus, chimeric sulfatases 
can be formed in which one or more of the native domains or subregions has been 
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duplicated, removed, or replaced by another. This includes but is not limited to catalytic 
sulfatase or substrate binding domains, and regions involved in activation. 

It is understood however that such regions could be derived from a sulfatase that 
has not yet been characterized. Moreover, sulfatase function can be derived from 
5 peptides that contain these functions but are not in a sulfatase family. 

The isolated 22438 sulfatase protein can be purified fromeells that naturally 
express it, such as DRG neurons, including small and medium sized neurons, spinal 
cord, including intemeurons and motor neurons, and brain, especially purified from cells 
that have been altered to express it (recombinant), or synthesized using known protein 
synthesis methods. 

The isolated 23553 sulfatase protein can be purified from cells that naturally 
express it, such as cells from any of the tissues shown in Figures 9 and 21-26, especially 
purified from cells that have been altered to express it (recombinant), or synthesized 
using known protein synthesis methods. 

The isolated 25278 sulfatase protein can be purified from cells that naturally 
express it, such as cells from any of the tissues shown in Figures 14 and 28, especially 
purified from cells that have been altered to express it (recombinant), or synthesized 
using known protein synthesis methods. 

The isolated 26212 sulfatase protein can be purified from cells thatnaturally 
express it, such as cells from any of the tissues shown in Figures 30-36, especially 
purified from cells that have been altered to express it (recombinant), or synthesized 
using known protein synthesis methods. 

In one embodiment, the protein is produced by recombinant DNA techniques. 
For example, a nucleic acid molecule encoding the sulfatase polypeptide is cloned into 
an expression vector, the expression vector introduced into a host cell and the protein 
expressed in the host cell. The protein can then be isolated from the cells by an 
appropriate purification scheme using standard protein purification techniques. 

Polypeptides often contain amino acids other man the 20 amino acids commonly 
referred to as the 20 naturally-occurring amino acids. Further, many amino acids, 
including the terminal amino acids, may be modified by natural processes, such as 
processing and other post-translational modifications, or by chemical modification 
techniques well known in the art. Common modifications that occur naturally in 
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polypeptides are described in basic texts, detailed monographs, and the research 
literature, and they are well known to those of skill in the art. 

Accordingly, the polypeptides also encompass derivatives or analogs in which a 
substituted amino acid residue is not one encoded by the genetic code, in which a 
5 substituent group is included, in which the mature polypeptide is fused with another 
compound, such as a compound to increase the half-life of the polypeptide (for example, 
polyethylene glycol), or in which the additional amino acids are fused to the mature 
polypeptide, such as a leader or secretory sequence or a sequence for purification of the 
mature polypeptide or a pro-protein sequence. 

Known modifications include, but are not limited to, acetylation, acylation, 
ADP-ribosylation, amidation, covalent attachment of flavin, covalent attachment of a 
heme moiety, covalent attachment of a nucleotide or nucleotide derivative, covalent 
attachment of a lipid or lipid derivative, covalent attachment of phosphatidylinositol, 
cross-linking, cyclization, disulfide bond foimation, demethylation, formation of 
covalent crosslinks, formation of cystine, formation of pyroglutamate, formylafion, 
gamma carboxylation, glycosylation, GPI anchor formation, hydroxylation, iodlnation, 
methylation, myristoylation, oxidation, proteolytic processing, phosphorylation, 
prenylation, racemization, selenoylation, sulfation, transfer-RNA mediated addition of 
amino acids to proteins such as arginylation, and ubiquitination. 

Such modifications are well-known to those of skill in the art and have been 
described in great detail in the scientific literature. Several particularly common 
modifications, glycosylation, lipid attachment, sulfation, gamma-carboxylation of 
glutamic acid residues, hydroxylation and ADP-ribosylation, for instance, are described 
in most basic texts, such as Proteins - Structure and Molecular Properties, 2nd ed., T.E. 
Creighton, W. H. Freeman and Company, New York (1993). Many detailed reviews are 
available on this subject, such as by Wold, F., Posttranslational Covalent Modification 
of Proteins, B.C. Johnson, Ed., Academic Press, New York 1-12 (1983); Seifter etal. 
(1990) Meth. Enzymol 182: 626-646) and Rattan etal. (1992) Ann. N.Y.Acad Sci. 
663A%-62). 

As is also well known, polypeptides are not always entirely linear. For instance, 
polypeptides may be branched as a result of ubiquitination, and they may be circular, 
with or without branching, generally as a result of post-translation events, including 
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natural processing events and events brought about by human manipulation which do not 
occur naturally. Circular, branched and branched circular polypeptides may be 
synthesized by non-translational natural processes and by synthetic methods. 

Modifications can occur anywhere in a polypeptide, including the peptide 
> backbone, the amino acid side-chains and the ammo or carboxyl termini Blockageof 
the amino or carboxyl group in a polypeptide, or both, by a covalent modification, is 
common in naturally-occurring and synthetic polypeptides. For instance, the 
arnmoterminal residue of polypeptides made in E. coli, prior to proteolytic processing, 
almost invariably will be N-formylmetbionine. 

The modifications can be a function of how the protein is made. For 
recombinant polypeptides, for example, the modifications will be determined by the host 
cell posttranslational modification capacity and the modification signals in the 
polypeptide amino acid sequence. Accordingly, when glycosylate is desired, a 
polypeptide should be expressed in a glycosylating host, generally a enkaryotic cell. 
Insect cells often carry out the same posttranslational glycosylations as mammaHan cells 
and, for this reason, insect cell expression systems have been developed to efficiently 
express mammalian proteins having native patterns of glycosylation. Similar 
considerations apply to other modifications. 

The same type of modification may be present in the same or varying degree at 
several sites in a given polypeptide. Also, a given polypeptide may contain more than 
one type of modification. 

Polypeptide Uses 

The protein sequences of the present invention can be used as a "query 
sequence" to perform a search against public databases to, for example, identify other 
family members or related sequences. Such searches can be performed using the 
NBLAST and XBLAST programs (version 2.0) of Altschul et al. (1990) J. Mol. Biol 
215:403-10. BLAST nucleotide searches can be performed with the NBLAST 
program, score = 100, wordlength = 12 to obtain nucleotide sequences homologous to 
the nucleic acid molecules of the invention. BLAST protein searches can be 
performed with the XBLAST program, score = 50, wordlength = 3 to obtain amino 
acid sequences homologous to the proteins of the invention, To obtain gapped 
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alignments for comparison purposes, Gapped BLAST can be utilized as described in 
Altschul et Hi., (1997) Nucleic Acids Res. 25(17):3389-3402. When utilizing BLAST 
and Gapped BLAST programs, the default parameters of the respective programs 
(eg., XBLAST and NBLAST) can be used. See http://www.ncbi.nlm.nih.gov . 
5 Sulfatase polypeptides are useful for producing antibodies specific for sulfatase, 

regions, or fragments. Regions having a high antigenicity index score are shown in 
Figures 3, 7, 12, and 17. 

Sulfatase polypeptides are useful for biological assays related to sulfetases. Such 
assays involve any of the known sulfatase functions or activities or properties useful for 
10 diagnosis and treatment of sulfatase-related conditions, including those in the references 
cited herein, which are incorporated by reference for these assays, functions, and 
disorders. 

These assays include, but are not limited to, binding to and/or cleaving specific 
substrates to produce fragments, steady state levels of sulfated compounds, cysteine 
15 modification, and biological assays related to the functions produced by sulfated 

compounds. Specific substrates useful for assays related to sulfate conjugate hydrolysis 
include but are not limited to xenobiotics, thyroid hormones, steroids, and catechols. 
Specific sulfate conjugates include, but are not limited to, 3a-stufatolithocholyltaurine, 
sulfate conjugates of estrone, 4-methylumbelliferone, and harmol, sulfated cartilage and 

20 proteoglycans, 4-nitrophenol, simple phenols, hy&oxyarylamines, iodomyronines, 
catecholamines, 1-naphmyl, salbutamol, estrogens, ethinylestradiol, equilenin, 
diethylstilbestrol, androgens, cholesterol bile salts, pregnenolone, benzylic alcohols, 
glycolipidsulfates, complex carbohydrates such as dermatan and chondrotin sulfate, 
steroid sulfate, sulfate conjugates of xenobiotics, cholesterol sulfate, xenobiotic phenyls, 

25 o-cresol, vanillan, eugenoL m-cresol, thymol, ethyl"4,4-dihydroxybenzoate, />-cresol, 
sesamol, methyl-2,6-dihydroxy-4-methylbenzyloate, methyl-2,4-dihydroxybenzoate, 
methyl-3,5-dihydroxybenzoate, tyramine, dopamine, 5 hydroxytryptamine, pyrogallol, 
4-nitrocatecholsulfate, estrone sulfate, metabolites of the cytochrome P450 mono- 
oxygenase system, dihydroepiandrosterone sulfate (DHEAS), minoxidil, cicletanine, 

30 sulfated mutagens and carcinogens, such as aromatic amines fmcluding heterocyclic 
amines), and benzylic alcohols of chemicals such as polycyclic aromatic hydrocarbons, 
saffrole and estragole, glycosaminoglycans, sulfolipids, betahydroxysteroids, sulfate 
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esters of chromogenic or fluorogenic aromatic compounds, cerebroside sulfate, keritan 
sulfate, and heparan sulfate. Substrates also include any in the references cited herein, 
which are incorporated herein by reference for these substrates. Accordingly the assays 
include, but are not limited to, these sulfated substrates and biological effects of sulfation 
5 or desulfation of these substrates and associated biochemical, cellular, or phenotypic 
effects of sulfation of desulfation, and any of the other biological or functional properties 
of these proteins, including, but not limited to, those disclosed herein, and in any 
reference cited herein which is incorporated herein by reference for the disclosure of 
these properties and for the assays based on these properties. Further, assays may relate 
10 to changes in the protein, per se, and on the effects of these changes, for example, 
activation of the sulfatase by modification of a cysteine residue as disclosed herein, 
cleavage of the propeptide by a proteinase, induction of expression of the protein in vivo, 
inhibition of function, as well as any other effects on the protein mentioned herein or 
cited in any reference herein, which are incorporated herein by reference for these effects 
15 and for the subsequent biological consequences of these effects. 

Sulfatase polypeptides are also useful in drug screening assays, in cell-based or 
ceU-free systems. Cell-based systems can be native, i.e., cells that normally express 
sulfatase, such as those discussed above, especially tumor cells, as a biopsy, or expanded 
in cell culture. In one embodiment, however, cell-based assays involve recombinant 
20 host cells expressing sulfatase. Accordingly, these drag-screening assays can be based 
on effects on protein function as described above for biological assays useful for 
diagnosis and treatment 

Determining the ability of the test compound to interact with a sulfatase can also 
comprise determining the ability of the test compound to preferentially bind to the 
polypeptide as compared to the ability of a known binding molecule to bind to the 
polypeptide. 

The polypeptides can be used to identify compounds that modulate sulfatase 
activity. Such compounds, for example, can increase or decrease affinity or rate of 
binding to substrate, compete with substrate for binding to sulfatase, or displace substrate 
bound to sulfatase. Both sulfatase and appropriate variants and fragments can be used in 
high-throughput screens to assay candidate compounds for the ability to bind to 
sulfatase. These compounds can be further screened against a functional sulfatase to 
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detennine the effect of the compound oh sul&tase activity. Compounds can be 
identified that acti vate (agonist) or inactivate (antagonist) sulfatase to a desired degree. 
Modulatory methods can be performed in vitro (eg., by culturing the cell with the agent) 
or, alternatively, in vivo (e.g., by adrrunistering the agent to a subject). 
5 Sulfatase polypeptides can be used to screen a compound for the ability to 

stimulate or inhibit interaction between sulfatase protein and a target molecule that 
normally interacts with the sulfatase, for example, substrate of the sulfatase domain. 
The assay includes the steps of combining sulfatase protein with a candidate 
compound under conditions that allow the sulfatase protein or fragment to interact 

10 with the target molecule, and to detect the formation of a complex between the 
sulfatase protein and the target or to detect the biochemical consequence of the 
interaction with the sulfatase and the target. 

Determining the ability of the sulfatase to bind to a target molecule can also be 
accomplished using a technology such as real-time Bimolecular Interaction Analysis 

15 (BIA). Sjolander et al. (1991) Anal Chem. <tf :2338-2345 and Szabo et al (1995) 
Curr. Opin. Struct. Biol. 5:699-705. As used herein, "BIA" is a technology for 
studying biospecific interactions in real time, without labeling any of the interaetants 
(e.g., BIAcore™). Changes in the optical phenomenon surface plasmon resonance 
(SPR) can be used as an indication of real-time reactions between biological 
20 molecules. 

The test compounds of the present invention can be obtained using any of the 
numerous approaches in combinatorial library methods known in the art, including: 
biological libraries; spatially addressable parallel solid phase or solution phase 
libraries; synthetic library methods requiring deconvolution; the 'one-bead one- 
25 compound' library method; and synthetic library methods using affinity 

chromatography selection. The biological library approach is limited to polypeptide 
libraries, while the other four approaches are applicable to polypeptide, non-peptide 
oligomer or small molecule libraries of compounds (Lam, K.S. (1997) Anticancer 
DrugDes. 72:145). 

30 Examples of methods for the synthesis of molecular libraries can be found in 

the art, for example in DeWitt et al. (1993) Proc. Natl. Acad Sci. USA 90:6909; Erb 
et al. (1994) Proc. Natl. Acad. Sci. USA P7:11422; Zuckermann etal. (1994). J. Med 
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Chem. 57:2678; Cho etal. (1993) Science 267;1303; Careil er a/, (1994) Angew. 
Chem. Int. Ed. Engl. 53:2059; Careil etal. (1994) Angew. Chem. Int. Ed. Engl. 
33:2061 ; and in GaUop (1994) ^ Med Chem. 37:1233. Libraries of compounds 
may be presented in solution (e.g., Houghten (1992) Biotechniques 13:4 12-421), or on 
5 beads (Lam (1991) Nature 3J#:82-84), chips (Fodor (1993) Mtfsre 36^:555-556), 
bacteria (Ladner USP 5,223,409), spores (Ladner USP »409), plasmids (Cull et al. 
(1992) Proc. Natl. Acad Sci. USA SP:1865-1869) or on phage (Scott and Smith 
(1990) Science M>:386-390); (Devlin (1990) Science 249:404-406); (Cwirla et al. 
(1990) Proc. Natl Acad Sci. P7:6378-6382); (Felici (1991) J. Mot Biol. 222:301- 
10 310); (Ladner supra). 

Candidate compounds include, for example, 1) peptides such as soluble peptides, 
including Ig-tailed fusion peptides and members of random peptide libraries (see, e.g., 
Lam etal. (1991) Nature 35*82-84; Houghten et al. (1991) Nature 35*84-86) and 
combinatorial chemistry-derived molecular libraries made of D- and/or L- configuration 
15 amino acids; 2) phosphopeptides (e.g„ members of random and partially degenerate, 
directed phosphopeptide libraries, see, e.g„ Songyang et al. (1993) Cell 72:767-778); 3) 
antibodies (e.g., polyclonal, monoclonal, humanized, anti-idiotypic, chimeric, and single 
chain antibodies as well as Fab, F(ab')2, Fab expression library fragments, and epitope- 
binding fragments of antibodies); 4) small organic and inorganic molecules (e.g., 
20 molecules obtained from combinatorial and natural product libraries); substrate analogs 
including, but not limited to, substrates disclosed herein. 

One candidate compound is a soluble full-length sulfatase or fragment that 
competes for substrate. Other candidate compounds include mutant sulfatases or 
appropriate fragments containing mutations that affect sulfatase function and compete 
for substrate. Accordingly, a fragment that competes for substrate, for example with a 
higher affinity, or a fragment that binds substrate but does not process or otherwise affect 
it, is encompassed by the invention. 

The invention provides other end points to identify compounds that modulate 
(stimulate or inhibit) sulfatase activity. The assays typically involve an assay of cellular 
events that indicate sulfatase activity. Thus, the expression of genes that are up- or 
down-regulated in response to sulfatase activity can be assayed. In one embodiment, the 
regulatory region of such genes can be operably linked to a marker that is easily 
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detectable, such as luciferase. Alternatively, modification of the sulfatase could also be 



Any of the biological or biochemical functions mediated by the sulfatase can be 
used as an endpoint assay. These include any of 1he biochemical or 

5 biochentical/biological events described h^ in any reference cited herein, 

incorporated by reference for these endpoint assay targets, and other functions known to 
those of ordinary skill in the art Specific end points can include, but are not limited to, 
the events resulting from expression (or lack thereof) of sulfatase activity. With respect 
to disorders, this would include, but not be limited to, effects on function, 

) differentiation, and proliferation, which can be assayed, as well as the biological 
effects of function, such as disorders discussed hereinabove and in the references 
cited hereinabove which are incorporated herein by reference for the disorders 
disclosed in those references and other disorders and pathology. In the case of the 
22438 sulfatase, models of pain can be used as an end point. In the case of the 23553 
and 25278 sulfatases, tumor progression can be used as an end point In the case of 
the 26212 sulfatase, tumor angiogenesis and/or tumor progression can be used as an 
end point. 

Binding and/or activating compounds can also be screened by using chimeric 
sulfatase proteins in which one or more regions, segments, sites, and the like, as 
disclosed herein, or parts thereof, can be replaced by heterologous and homologous 
counterparts derived from other sulfatases. For example, a catalytic region can be used 
that interacts with a different substrate specificity and/or affinity than the native 
sulfatase. Accordingly, a different set of components is available as an end-point assay 
for activation. As a further alternative, the site of modification by an effector protein, for 
example, activation or phosphorylation, can be replaced with the site for a different 
effector protein. Activation can also be detected by a reporter gene containing an easily 
detectable coding region operably linked to a transcriptional regulatory sequence that is 
part of the native pathway in which sulfatase is involved. 

Sulfatase polypeptides are also useful in competition binding assays in methods 
designed to discover compounds that interact with the sulfatase. Thus, a compound is 
exposed to a sulfatase polypeptide under conditions that allow the compound to bind or 
to otherwise interact with the polypeptide. Soluble sulfatase polypeptide is also added to 
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the mixture. If the test compound interacts with the soluble sulfatase polypeptide, it 
decreases the amount of complex formed or activity from the sulfatase target. This type 
of assay is particularly useful in cases in which compounds are sought that interact with 
specific regions of me sulfatase. Thus, the soluble polypeptide that competes with the 

5 target sulfatase region is designed to contain peptide sequences corresponding to the 
region of interest. 

Another type of competibon-bmding assay can be used to discover compounds 
that interact with specific functional sites. As an example, bindable substrate analog and 
a candidate compound can be added to a sample of the sulfatase. Compounds that 

1 0 interact with the sulfatase at the same site as the substrate or analog will reduce the 
amount of complex formed between the sulfatase and the substrate or analog. 
Accordingly, it is possible to discover a compound that specifically prevents interaction 
between the sulfatase and the component Another example involves adding a candidate 
compound to a sample of sulfatase and cleavable substrate. A compound that competes 

15 with the substrate will reduce the amount of hydrolysis or binding of the substrate to the 
sulfatase. Accordingly, compounds can be discovered that directly interact with the 
Sulfatase and compete with the substrate. Such assays can involve any other component 
mat interacts with the sulfatase. 

To perform cell free drug screening assays, it is desirable to immobilize either 

20 sulfatase, or fragment, or its target molecule to facilitate separation of complexes from 
uncompleted forms of one or both of the proteins, as well as to accommodate 
automation of the assay. 

Techniques for immobilizmg proteins on matrices can be used in the drug 
screening assays. In one embodiment, a fusion protein can be provided which adds a 

25 domain that allows the protein to be bound to a matrix. For example, glutathione-S- 
transferase/sulfatase fusion proteins can be adsorbed onto glutathione sepharose beads 
(Sigma Chemical, St. Louis, MO) or glutathione derivatized microtitre plates, which are 
then combined with the cell lysates (e.g., 35 S-labeled) and the candidate compound, and 
the mixture incubated under conditions conducive to complex formation (e.g., at 

30 physiological conditions for salt and pH). Following incubation, the beads are washed to 
remove any unbound label, and the matrix immobilized and radiolabel determined 
directly, or in the supernatant after the complexes is dissociated Alternatively, the 
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complexes can be dissociated from the matrix, separated by SDS-PAGE, and the level of 
sulfatase-binding protein found m me bead fraction quantitated from the gel using 
standard electrophoretic techniques. For example, either the polypeptide or its target 
molecule can be iirmiobuized utiUadng conjugation of biotin and streptavidin using 
techniques well known in the art Alternatively, antibodies reactive with the protein but 
which do not interfere with binding of the protein to its target molecule can be 
derivatized to the wells of the plate, and the protein trapped in the wells by antibody 
conjugation. Preparations of a sulfatase-binding target component, such as substrate or 
activating enzyme, and a candidate compound are incubated in sulfetase-presenting 
wells and the amount of complex trapped in the well can be quantitated. Methods for 
detecting such complexes, in addition to those described above for the GST-immobilized 
complexes, include immunodetection of complexes using antibodies reactive with the 
sulfatase target molecule, or which are reactive with the sulfatase and compete with the 
target molecule; as well as enzyme-linked assays which rely on detecting an enzymatic 



Modulators of sulfatase activity identified according to these drug screening 
assays can be used to treat a subject with a disorder related to the sulfatase, by treating 
cells mat express the sulfatase. These methods of treatment include the steps of 
administering the modulators of sulfatase activity in a pharmaceutical composition as 
described herein, to a subject in need of such treatment. 

The 23553, 25278, and 26212 sulfatases are differentially expressed in tumor 
cells as disclosed herein. Accordingly, these sulfatases are relevant to these disorders 
and relevant as well to differentiation, function, and growth of the tissues giving rise to 
the tumors. The 22438 sulfatase is expressed as described above, and accordingly is 
relevant for disorders involving these tissues. Disorders include, but are not limited to, 
those discussed hereinabove. Moreover, since the gene is expressed in the central 
nervous system, this sulfatase is relevant for the treatment of pain. 

Sulfatase polypeptides are thus useful for treating a sulfatase-associated disorder 
characterized by aberrant expression or activity of a sulfatase. "Aberrant expression" 
or "misexpression", as used herein, refers to a non-wild type pattern of gene 
expression, at the RNA or protein level. It includes: expression at non-wild type 
levels, i.e., over or under expression; a pattern of expression that differs from wild 
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type in terms of the time or stage at which the gene is expressed, e.g., increased or 
decreased expression (as compared with wild type) at a predetermined developmental 
period or stage; a pattern of expression that differs from wild type in terms of 
decreased expression (as compared with wild type) in a predetermined cell type or 
5 tissue type; a pattern of expression that differs from wild type in terms of the splicing 
size, amino acid sequence, post-transitional modification, or biological activity of the 
expressed polypeptide; a pattern of expression that differs from wild type in terms of 
the effect of an environmental stimulus or extracellular stimulus on expression of the 
gene, e.g., a pattern of increased or decreased expression (as compared with wild 
10 type) in the presence of an increase or decrease in the strength of the stimulus. 

In one embodiment, the method involves administering an agent (e.g., an 
agent identified by a screening assay described herein), or combination of agents that 
modulates (e.g., upregulates or downregulates) expression or activity of the protein. 
In another embodiment, the method involves administering sulfatase as therapy to 
15 compensate for reduced or aberrant expression or activity of the protein. 

Methods for treatment include but are not limited to the use of soluble sulfatase 
or fragments of sulfatase protein that compete for substrate or any other component that 
directly interacts with sulfatase, or any of the enzymes that modify the sulfatase. These 
sulfaiases or fragments can have a higher affinity for the target so as to provide effective 
20 competition. 

Stimulation of activity is desirable in situations in which the protein is 
abnormally downregulated and/or in which increased activity is likely to have a 
beneficial effect. Likewise, inhibition of activity is desirable in situations in which 
Hie protein is abnormally upregulated and/or in which decreased activity is likely to 
25 have a beneficial effect. In one example of such a situation, a subject has a disorder 
characterized by aberrant development or cellular differentiation. In another example, 
the subject has a disorder characterized by an aberrant hematopoietic response. In 
another example, it is desirable to achieve tissue regeneration in a subject. 

In yet another aspect of the invention, the proteins of the invention can be used 
30 as "bait proteins" in a two-hybrid assay or three-hybrid assay (see, e.g., U.S. Patent 
No. 5,283,317; Zervos etal (1993) Cell 72:223-232; Madura et al. (1993) J. Biol. 
Chan. 2^5:12046-12054; Bartel etal (1993) Biotechniques 14:920-924; Iwabuchi et 
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al. (1993) Oncogene 5:1693-1696; and Brent WO 94/10300), to identify other 
proteins (captured proteins) which bind to or interact with the proteins of the 
invention and modulate their activity. 

Sulfatase polypeptides also are useful 10 provide atarget for diagnosing a disease 
5 or predisposition to disease mediated by Ine sulfetase, including, but not limited to, those 
diseases disclosed herein, in the references cited herein, and as disclosed above in the 
background. Accordingly, methods are provided for detecting the presence, or levels of 
the sulfatase in a cell, tissue, or organism. The method involves contacting a biological 
sample with a compound capable of interacting with the sulfetase such mat the 
10 interaction can be detected. One agent for detecting a sulfatase is an antibody capable of 
selectively binding to the sulfatase. A biological sample includes tissues, cells and 
biological fluids isolated from a subject, as well as tissues, cells and fluids present within 
a subject. 

The sulfatase also provides a target for diagnosing active disease, or 
predisposition to disease, in a patient having a variant sulfetase. Thus, sulfetase can be 
isolated from a biological sample and assayed for the presence of a genetic mutation that 
results in an aberrant protein. This includes amino acid substitution, deletion, insertion, 
rearrangement, (as the result of aberrant splicing events), and inappropriate post- 
translational modification. Analytic methods include altered electrophoretic mobility, 
altered tryptic peptide digest, altered sulfatase activity in cell-based or cell-free assays, 
such as by alteration in substrate binding or degradation, or ability to be activated by the 
activation enzyme, or antibody-bmding pattern, altered isoelectric point, direct amino 
acid sequencing, and any other of the known assay techniques useful for detecting 
mutations in a protein in general or in a sulfatase specifically, such as are disclosed 
herein. 

In vitro techniques for detection of sulfatase include enzyme linked 
immunosorbent assays (ELISAs), Western blots, immunoprecipitations and 
immunofluorescence. Alternatively, the protein can be detected in vivo in a subject by 
introducing into the subject a labeled anti-sulfetase antibody. For example, the antibody 
can be labeled with a radioactive marker whose presence and location in a subject can be 
detected by standard imaging techniques. Particularly useful are methods, which detect 
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the allelic variant of sulfatase expressed in a subject, and methods, which detect 
fragments of sulfatase in a sample. 

Sulfatase polypeptides are also useful in pharmacogenomic analysis. 
Pharmacogenomics deal with clinically significant hereditary variations in the response 
5 to drugs due to altered drug disposition and abnormal action in affected persons. See, 
e.g., Eichelbaum, M. (1996) Clin. Exp. Pharmacol. Physiol 23(10-1 J) :983-985, and 
Under, M.W. (1997) Clin Chem. 430:254-266. The clinical outcomes of these 
variations result in severe toxicity of therapeutic drugs in certain individuals or 
therapeutic failure of drugs in certain individuals as a result of individual variation in 

1 0 metabolism. Thus, the genotype of the individual can determine the way a therapeutic 
compound acts on the body or the way the body metabolizes the compound. Further, the 
activity of drug metabolizing enzymes affects both the intensity and duration of drug 
action. Thus, the pharmacogenomics of the individual permit the selection of effective 
compounds and effective dosages of such compounds for prophylactic or therapeutic 

15 treatment based on the individual's genotype. The discovery of genetic polymorphisms 
k some drug metabolizing enzymes has explained why some patients do not obtain the 
expected drug effects, show an exaggerated drug effect, or experience serious toxicity 
from standard drug dosages. Polymorphisms can be expressed in the phenorype of the 
extensive metabolizer and the phenotype of the poor metabolizer. Accordingly, genetic 

20 polymorphism may lead to allelic protein variants of sulfatase in which one or more of 
sulfatase functions in one population is different from those in another population. The 
polypeptides thus allow a target to ascertain a genetic predisposition that can affect 
treatment modality. Thus, in a peptide-based treatment, polymorphism may give rise to 
catalytic regions that are more or less active. Accordingly, dosage would necessarily be 

25 modified to maximize the therapeutic effect within a given population containing the 
polymorphism As an alternative to genotyping, specific polymorphic polypeptides 
could be identified. 

Sulfatase polypeptides are also useful for monitoring therapeutic effects during 
clinical trials and other treatment. Thus, the therapeutic effectiveness of an agent that is 
30 designed to increase or decrease gene expression, protein levels or sulfatase activity can 
be monitored over the course of treatment using sulfatase polypeptides as an end-point 
target The monitoring can be, for example, as follows: (i) obtaining a pre- 
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administration sample from a subject prior to administration of the agent; (ii) 
detecting the level of expression or activity of the protein in the pre-administration 
sample; (iii) obtaining one or more post-administration samples from the subject; (iv) 
detecting the level of expression or activity of the protein in the post-administration 
5 samples; (v) comparing the level of expression or activity of the protein in the pre- 
administration sample with the protein in the post-administration sample or samples; 
and (vi) increasing or decreasing the administration of the agent to the subject 
accordingly. 



10 Antibodies 

The invention also provides antibodies lhat selectively bind to the sulfatase and 
its variants and fragments. An antibody is considered to selectively bind, even if it also 
binds to other proteins that are not substantially homologous with the sulfatase. These 
other proteins share homology with a fragment or domain of sulfatase. This 

1 5 conservation in specific regions gives rise to antibodies that bind to both proteins by 
virtue of the homologous sequence. In this case, it would be understood that antibody 
brndihg to the sulfatase is still selective. 

Antibodies can be polyclonal or monoclonal. An intact antibody, or a fragment 
thereof (e.g. Fab or F(ab'>2) can be used. An appropriate immunogenic preparation can 

20 be derived from native, recombinantly expressed, or chemically synthesized peptides. 
To generate antibodies, an isolated sulfatase polypeptide is used as an 
immunogen to generate antibodies using standard techniques for polyclonal and 
monoclonal antibody preparation. Either the full-length protein or antigenic peptide 
fragment can be used. Regions having a high antigenicity index are disclosed 

25 hereinabove. 

Antibodies are preferably prepared from these regions or from discrete 
fragments in these regions. However, antibodies can be prepared from any region of 
the peptide as described herein. A preferred fragment produces an antibody that 
diminishes or completely prevents substrate hydrolysis or binding. Antibodies can be 

30 developed against the entire sulfatase or domains of the sulfatase as described herein, 
for example, the substrate binding region, sulfatase motif, or subregions thereof. 
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Antibodies can also be developed against other specific functional sites as disclosed 
herein. 

The antigenic peptide can comprise a contiguous sequence of at least 12, 14, 15- 
20, 20-25, or 25-30 or more amino acid residues. In one embodiment, fragments 
5 correspond to regions that are located on the surface of the protein, e.g, hydrophihc 
regions. These fragments are not to be construed, however, as encompassing any 
fragments, which may be disclosed prior to the invention. 

Detection can be facilitated by coupling (i.e., physically linking) the antibody to 
a detectable substance. Examples of detectable substances include various enzymes, 
10 prosthetic groups, fluorescent materials, luminescent materials, bioluminescent 

materials, and radioactive materials. Examples of suitable enzymes include horseradish 
peroxidase, alkaline phosphatase, p-galactosidase, or acetylcholinesterase; examples of 
suitable prosthetic group complexes include streptavidin/biotin and avidin/biotin; 
examples of suitable fluorescent materials include umbelliferone, fluorescein, . 
IS fluorescein isothiocyanate, rhodamine, (hcMorotriazinylamine fluorescein, dansyl 
chloride or phycoerythrin; an example of a luminescent material includes lurninok 
examples of bioluminescent materials include luciferase, luciferin, and aequorin, and 
examples of suitable radioactive material include l2 % 13! I, 35 S or 3 H. 

Antibody Uses 

The antibodies can be used to isolate a sulfatase by standard techniques, such as 
affinity chromatography or immunoprecipitation. The antibodies can facilitate the 
purification of the natural sulfatase from cells and recombinantly produced sulfatase 
expressed in host cells. 

The antibodies are useful to detect the presence of a sulfatase in cells or tissues to 
determine the pattern of expression of the sulfatase among various tissues in an organism 
and over the course of normal development. The antibodies can be used to detect a 
sulfatase in situ, in vitro, or in a cell lysate or supernatant in order to evaluate the 
abundance and pattern of expression. Antibody detection of circulating fragments of the 
full length sulfatase can be used to identify sulfatase turnover. In addition, the antibodies 
can be used to assess abnormal tissue distribution or abnormal expression during 
development 
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Further, the antibodies can be used to assess sulfatase expression in disease states 
such as in active stages of the disease or in an individual with a predisposition toward 
disease related to sulfatase function. When a disorder is caused by an inappropriate 
tissue distribution, developmental expression, or level of expression of sulfatase protein, 
5 the antibody can be prepared against the normal sulfatase protein. If a disorder is 
characterized by a specific mutation in sulfetase, antibodies specific for this mutant 
protein can be used to assay for the presence of the specific mutant sulfatase, However, 
intraceliularly-made antibodies ("ktrabodies") are also encompassed, which would 
recognize intracellular sulfatase peptide regions, 
10 The antibodies can also be used to assess normal and aberrant subcellular 

localization of cells in the various tissues in an organism. Antibodies can be developed 



The diagnostic uses can be applied, not only in genetic testing, but also in 
monitoring a treatment modality. Accordingly, where treatment is ultimately aimed at 
1 5 correcting sulfetase expression level or the presence of aberrant sulfatases and aberrant 
tissue distribution or developmental expression, antibodies directed against the sulfatase 
or relevant fragments can be used to monitor therapeutic efficacy. 

Additionally, antibodies are useful in pharmacogenomic analysis. Thus, 
antibodies prepared against polymorphic sulfatase can be used to identify individuals 
20 that require modified treatment modalities. 

The antibodies are also useful as diagnostic tools as an immunological marker 
for aberrant sulfatase analyzed by electropboretic mobility, isoelectric point, tryptic 
peptide digest, and other physical assays known to those in the art. 

The antibodies are also useful for tissue typing. Thus, where a specific sulfatase 
25 has been correlated with expression in a specific tissue, antibodies that are specific for 
this sulfatase can be used to identify a tissue type. 

The antibodies are also useful in forensic identification, Accordingly, where an 
individual has been correlated with a specific genetic polymorphism resulting in a 
specific polymorphic protein, an antibody specific for the polymorphic protein can be 
3Q used as an aid in identification. 

The antibodies are also useful for inhibiting sulfatase function, for example, 
substrate binding, or sulfatase activity. For example, sulfatase activity may be measured 
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by the ability to form a binding complex with a sulfated conjugate, such as disclosed 
herein. 

These uses can also be applied in a therapeutic context in which treatment 
involves inhibiting sulfatase function. An antibody can be used, for example, to block 
5 substrate binding. Antibodies can be prepared against specific fragments containing 
sites required for function or against intact sulfatase associated with a cell. 

Completely human antibodies are particularly desirable for therapeutic treatment 
of human patients. For an overview of this technology for producing human 
antibodies, see Lonberg et al. (1995) Int. Rev. Immunol. 13:65-93. For a detailed 
discussion of this technology for producing human antibodies and human monoclonal 
antibodies and protocols for producing such antibodies, e.g., U.S..Patent.5,625,126; 
U.S. Patent.5,633,425; U.S. Patent 5,569,825; U.S. Patent 5,661,016; and U.S. Patent 
5,545,806. 



10 



15 of a sulfatase protein in a biological sample. The kit can comprise antibodies such as a 
labeled or labelable antibody and a compound or agent for detecting the sulfatase in a 
biological sample; means for detennining the amount of sulfatase in the sample; and 
means for comparing the amount of sulfatase in the sample with a standard. The 
compound or agent can be packaged in a suitable container. The kit can further 

20 comprise instructions for using the kit to detect the sulfatase. 



The nucleotide sequences in SEQ ID NOS:2, 4, 6, and 8 were obtained by 
sequencing the deposited human cDNAs. Accordingly, the sequences of the deposited 
clones are controlling as to any discrepancies between the two and any reference to a 
sequence of SEQ ID NOS:2, 4, 6, or 8, includes reference to the sequence of the 
deposited cDNA. 

The specifically disclosed cDNA comprises the coding region and 5' and 3* 
untranslated sequences in SEQ ID NOS:2, 4, 6, or 8. The coding sequences of the 
cDNA's are set forth in SEQ ID NQ&tl, 12, 13, and 14. 

The invention provides isolated polynucleotides encoding the novel sulfatases. 
The term "sulfatase polynucleotide" or "sulfatase nucleic acid" refers to the sequences 
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shown in SEQ ID NQS:2, 4, 6, 8, 11, 12, 13, or 14, or in the deposited cDNAs. The 
term "sulfatase polynucleotide" or "sulfatase nucleic acid" further includes variants and 
fragments of sulfatase polynucleotides. 

Generally, nucleotide sequence variants of the invention will have at least 
5 60%, 65%, 70%, 75%, 80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 
98%, or 99% identity to one of the nucleotide sequences disclosed herein. 

An "isolated" sulfatase nucleic acid is one that is separated from other nucleic 
acid present in the natural source of sulfatase nucleic acid. Preferably, an "isolated" 
nucleic acid is free of sequences which naturally flank sulfatase nucleic acid (i.e., 

10 sequences located at the 5' and 3* ends of the nucleic acid) in the genomic DNA of the 
organism from which the nucleic acid is derived. However, there can be some flanking 
nucleotide sequences, for example up to about 5KB. The important point is that the 
sulfatase nucleic acid is isolated from flanking sequences such that it can be subjected to 
the specific manipulations described herein, such as recombinant expression, preparation 

15 of probes and primers, and other uses specific to the sulfatase nucleic acid sequences. In 
one embodiment, the sulfatase nucleic add comprises only the coding region. 

Moreover, an "isolated" nucleic acid molecule, such as a cDNA orRNA 
molecule, can be substantially free of other cellular material, or culture medium when 
produced by recombinant techniques, or chemical precursors or other chemicals when 

20 chemically synthesized. However, the nucleic acid molecule can be fused to other 
coding or regulatory sequences and still be considered isolated. 

In some instances, the isolated material will form part of a composition (for 
example, a crude extract containing other substances), buffer system or reagent mix. 
In other circumstances, the material may be purified to essential homogeneity, for 

25 example as determined by PAGE or column chromatography such as HPLC. 
Preferably, an isolated nucleic acid comprises at least about 50, 80 or 90% (on a 
molar basis) of all macromolecular species present. 

For example, recombinant DNA molecules contained in a vector are considered 
isolated. Further examples of isolated DNA molecules include recombinant DNA 

30 molecules maintained in heterologous host cells or purified (partially or substantially) 
DNA molecules in solution. Isolated RNA molecules include in vivo or in vitro RNA 
transcripts of the isolated DNA molecules of the present invention. Isolated nucleic acid 
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The invention «erprovides va^ and fragments 

thereof, that differ from the nucleotide sequence shown in SEQ ID NOS:2, 4, 6, 8, 1 1, 
12, 13, or 14 due to degeneracy ofthe genetic code and thus encode the same protein as 

thatencodedbyanucleou^esequerM«showninSEQJDNOS5,4,6,8,ll,12 13 or 
5 14. 

Alternatively, a nucleic addnaolecule that is afragment of a 22438-like 
nucleotide sequence of the present invention comprises a nucleotide sequence 
consisting of nucleotides 1-100, 100-200/200-300,300-400, 400-500, 500-600, 600- 
700, 700-900, 900-1000, 1000-1100, 1100-1200, 1200-1300, 1300-1400, 1400-1500, 
i0 1500-1600, 1600-1700, 1700-1800, 1800-1900, 1900-2000, 2000-2100, 2100-2175 of 
SEQIDNO:2. 

A nucleic acid molecule that is a fragment of a 23553-like nucleotide sequence 
of the present invention comprises a nucleotide sequence consisting of nucleotides 1- 
100, 100-200, 200-300, 300-400, 400-500, 500-600, 600-700, 700-900, 900-1000, 

5 1000-1100, 1100-1200, 1200-1300, 1300-1400, 1400-1500, 1500-1600^ 1600-1700, 
1706-1800, 1800-1900, 1900-2000, 2000-2100, 2100-2200, 2200-2300, 2300-2400,' 
2400-2500, 2500-2600, 2600-2700, 2700-2800, 2800-2900, 2900-3000, 3000-3100,' 
3100-3200, 3200-3300, 3300-3400, 3400-3500, 3500-3600, 3600-3700,' 3700-3800' 
3800-3900, 3900-4000, 4000-4100, 4100-4200, 4200-4300, 4300-4321 of SEQ ID ' 

3 NO:4. 

A nucleic acid molecule that is a fragment of a 25278-like nucleotide sequence 
ofthe present invention comprises a nucleotide sequence consisting of nucleotides 1- 
100, 100-200, 200-300, 300-400, 400-500, 500-600, 600-700, 700-900, 900-1000, 
1000-1100, 1100-1200, 1200-1300, 1300-1400, 1400-1500, 1500-1600, 1600-1700, 
! 1700-1800, 1800-1900, 1900-2000, 2000-2100, 2100-2200, 2200-2300^ 2300-2400,' 
2400-2500, 2500-2600, 2600-2700, 2700-2800, 2800-2900, 2900-2940 of SEQ ID ' 
NO:6. 

A nucleic acid molecule that is a fragment of a 26212-like nucleotide sequence ofthe 
present invention comprises a nucleotide sequence consisting of nucleotides 1-100, 
100-200, 200-300, 300-400, 400-500, 500-600, 600-700, 700-900, 900-1000, 1000- 
1100, 1100-1200, 1200-1300, 1300-1400, 1400-1500, 1500-1600, 1600-1700, 1700- 
1800, 1800-1900, 1900-2000, 2000-2100, 2100-2200, 2200-2253 of SEQ ID NO:8. 
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The invention also provides sulfatase nucleic acid molecules encoding the 
variant polypeptides described herein. Such polynucleotides may be naturally occurring, 
such as allelic variants (same locus), homologs (different locus), and orthologs (different 
organism), or may be constructed by recombinant DNA methods or by chemical 
5 synthesis. Such non-naturally occvnring variants may be made by mutagenesis 

techniques, including those applied to jralymrcleotides, cells, or organisms. Accordingly, 
as discussed above, the variants can contain nucleotide substitutions, deletions, 
inversions and insertions. 

Typically, variants have a substantial identity with a nucleic acid molecules of 
SEQ ID NOS:2, 4, 6, 8, 1 1, 12, 13, or 14, and the complements thereof. Variation can 
occur in either or both the coding and non-coding regions. The variations can produce 
both conservative and non-conservative amino acid substitutions. 

Orthologs, homologs, and allelic variants can be identified using methods well 
known in the art. These variants comprise a nucleotide sequence encoding a sulfatase 
that is typically at least about 4045%, 45-50%, 50-55%, 55-60%, 60-65%, 65-70%, 70- 
75%, more typically at least about 75-80% or 80-85%, and most typically at least about 
85-50% or 90-95% or more homologous to the nucleotide sequence shown in SEQ ID 
NOS:2, 4, 6 or 8, or a fragment of this sequence. Such nucleic acid molecules can 
readily be identified as being able to hybridize under stringent conditions, to the 
nucleotide sequence shown in SEQ ID NOS:2, 4, 6, 8, 1 1, 12, 13, or 14, or a fragment of 
the sequence. 

In the case of the 23553 sulfatase, in one embodiment, a variant is greater than 
65% homologous with respect to nucleotide sequence. For the 25278 sulfatase, in one 
embodiment, a variant is greater than 50-60% homologous with respect to nucleotide 
sequence. With respect to the 26212 sulfatase, in one embodiment, a variant is greater 
than about 65-75% homologous with respect to nucleotide sequence. 

It is understood that stringent hybridization does not indicate substantial 
homology where it is due to general homology, such as polyA + sequences, or sequences 
common to all or most proteins, sulfatases, arylsulfatases, glu(x»samine-6-sulfatases, N- 
acetylgalactosamine-4-sulfatases, or any of the sulfatases to which the sulfatases of the 
present invention have shown homology by BLAST analysis, for example, regions to 
arylsulfatases A, B, C, D, E, F, EDS, and the like. Moreover, it is understood that 
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variants do not include any of the nucleic acid sequences that may have been disclosed 
prior to the invention. 

As used herein, the term "hybridizes under stringent conditions" describes 
conditions for hybridization and washing. Stringent conditions are known to those 
5 skilled in the art and can be found in Current Protocols in Molecular Biology John 
Wiley & Sons, N.Y. (1989), 6.3.1-6.3.6. Aqueous and nonaqueous methods are 
described in that reference and either can be used. A preferred, example of stringent 
hybridization conditions are hybridization in 6X sodium chloride/sodium citrate 
(SSC) at about 45°C, followed by one or more washes in 0.2X SSC, 0.1% SDS at 
1 0 50°C. Another example of stringent hybridization conditions are hybridization in 6X 
sodium chloride/sodium citrate (SSC) at about 45°C, followed by one or more washes 
in 0.2X SSC, 0. 1% SDS at 55°C. A further example of stringent hybridization 
conditions are hybridization in 6X sodium chloride/sodium citrate (SSC) at about 
45°C, followed by one or more washes in 0.2X SSC, 0.1% SDS at 60°C. Preferably, 

1 5 stringent hybridization conditions are hybridization in 6X sodium chloride/sodium 
citrate (SSC) at about 45°C, followed by one or more washes in 0.2X SSC, 0.1% SDS 
at 65°C. Particularly preferred stringency conditions (and the conditions mat should 
be used if the practitioner is uncertain about what conditions should be applied to 
determine if a molecule is within a hybridization limitation of the invention) are 0.5M 

20 Sodium Phosphate, 7% SDS at 65°C, followed by one or more washes at 0.2X SSC, 
1% SDS at 65°C. Preferably, an isolated nucleic acid molecule of the invention that 
hybridizes under stringent conditions to the sequence ofSEQ ID NOS:2, 4, 6, 8, 11, 
12, 13, or 14 corresponds to a naturally-occurring nucleic acid molecule. As used 
herein, a "naturally-occurruig" nucleic acid molecule refers to an RNA or DNA 

25 molecule having a nucleotide sequence that occurs in nature (e.g., encodes a natural 
protein). 

The present invention also provides isolated nucleic acids that contain a single 
or double stranded fragment or portion that hybridizes under stringent conditions to 
the nucleotide sequence ofSEQ ID NOS:2, 4, 6, 8, 1 1, 12, 13, or 14. or the 
30 complements of SEQ ID NOS:2, 4, 6, 8, 11, 12, 13, or 14. In one embodiment, the 
nucleic acid consists of a portion of a nucleotide sequence of SEQ ID NOS:2, 4, 6, 8, 
11, 12, 13, or 14 and the complements. The nucleic acid fragments of the invention 
-54- 



are at least about 10-15, preferably at least about 15-20 or 20-25 contiguous 
nucleotides, and can be 30,33, 35, 40, SO, 60, 70, 75, 80, 90, 100, 200, 500 or more 
nucleotides in length. Longer fragments, for example, 600 or more nucleotides in 
length, which encode antigemcproteim wiwlypeptides described herein are also 
5 useful. 

In the case of the 23553 sulfatase, in one embodiment, fragments are derived 
from nucleotide 1 to about nucleotide 670 and comprise 5-10 and 10-20 contiguous 
base pairs, and particularly greater than 18. For this sulfatase, in another 
embodiment, a fragment is derived from around nucleotide 3008 to 3514 and 
10 comprises around 5-10 and 10-20 contiguous nucleotides. In other embodiments for 
this sulfatase, a fragment is derived from around nucleotide 3994 to 4321 and is about 
5-10 or 10-20 contiguous nucleotides. For the 25278, in one embodiment, a fragment 
is derived from around nucleotide 130 to around nucleotide 454 and comprises a 
contiguous sequence of about 5-10 or 10-20 nucleotides. In another embodiment, the 
15 fragment is derived from around nucleotide 454 to around nucleotide 1 400 and 

comprises around 5-10 or 10-20 contiguous nucleotides, especially a fragment greater 
than 17 nucleotides. In another embodiment the fragment is derived from around 
nucleotide 1400 to around nucleotide 1850 and comprises a continuous sequence of 
around 5-10, 10-20, or 20-25 nucleotides, especially a fragment greater than 23 

20 nucleotides. In another embodiment, a fragment is derived from about nucleotide 
1933 to about nucleotide 2421 . Such a fragment comprises around 5-10 or 10-20 
contiguous nucleotides. For the 26212 sulfatase, in one embodiment, a fragment is 
derived from around nucleotide 272 to around nucleotide 538 and comprises a 
contiguous sequence of around 5-10 or 10-20 nucleotides, especially a fragment 

25 greater than 1 7 nucleotides. In another embodiment, the fragment is derived from 
around nucleotide 538 to around nucleotide 751 and comprises a contiguous sequence 
of at least 5-10 or 10-20 nucleotides, especially greater than 12 nucleotides. In 
another embodiment, the fragment is derived from around nucleotide 1074 to around 
1551 and comprises a contiguous nucleotide sequence of around 5-10, 10-20, or 20- 

30 30, especially greater than 20 nucleotides. In a further embodiment, the fragment is 
derived from around nucleotide 2052 to 2251 and comprises a contiguous sequence of 
5-10 and 10-20 nucleotides, especially fragments greater than 18 nucleotides. 
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Hie fragment can comprise DNA or RNA and can be derived from either the 
coding or die non-coding sequence. 

In another embodiment an isolated sulfatase nucleic acid encodes the entire 
coding region. In another embodiment the isolated sulfatase nucleic acid encodes a 
5 sequence conesponding to the mature protein. Other fragments include nucleotide 
sequences encoding the amino acid fragments described herein. 

Thus, sulfatase nucleic acid fragments further include sequences corresponding 
to the regions described herein, subregions also described, and specific functional sites. 
Sulfatase nucleic acid fragments also include combinations of the regions, segments, 
1 0 motifs, and other functional sites described above. It is understood that a sulfatase 
fragment includes any nucleic acid sequence that does not include the entire gene. A 
person of ordinary skill in the art would be aware of the many permutations that are 
possible. Nucleic acid fragments, according to the present invention, are not to be 
construed as encompassing those fragments that may have been disclosed prior to the 
15 invention. 

Where the location of the regions or sites have been predicted by computer 
analysis, one of ordinary skill would appreciate that the amino acid residues consulting 
these regions can vary depending on the criteria used to define the regions. 

20 Polynucleotide Uses 

The nucleotide sequences of the present invention can be used as a "query 
sequence" to perform a search against public databases, for example, to identify other 
family members or related sequences. For more information about public databases, 
see page 26, above. 

The nucleic acid fragments of the invention provide probes or primers in 
assays such as those described below. "Probes" are oligonucleotides that hybridize in 
a base-specific manner to a complementary strand of nucleic acid. Such probes 
include polypeptide nucleic acids, as described in Nielsen et at. (1991) Science 
25¥:1497-1500. Typically, a probe comprises a region of nucleotide sequence that 
hybridizes under highly stringent conditions to at least about 1 5, typically about 20- 
25, and more typically about 30, 40, 50 or 75 consecutive nucleotides of the nucleic 
acid sequence shown in SEQ ID NGS:2, 4, 6, 8, 11, 12, 13, or 14, and the 
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complements thereof. More typically, the probe further comprises a label, e.g., 
radioisotope, fluorescent compound, enzyme, or enzyme co-factor. 

As used herein, the term "primer" refers to a single-stranded oligonucleotide 
which acts as a point of initiation of template-directed DNA synthesis using well- 
5 known methods (e.g., PCR, LCR) including, but not limited to those described herein. 
The appropriate length of the primer depends on the particular use, but typically 
ranges from about 15 to 30 nucleotides. The term "primer site" refers to the area of 
the target DNA to which a primer hybridizes. The term "primer pair" refers to a set of 
primers including a 5' (upstream) primer that hybridizes with the 5' end of the nucleic 
acid sequence to be amplified and a 3' (downstream) primer that hybridizes with the 
complement of the sequence to be amplified. 

Sulfatase polynucleotides are thus useful for probes, primers, and in biological 
assays. Where the polynucleotides are used to assess sulfatase properties or functions, 
such as in the assays described herein, all or less than all of the entire cDNA can be 
useful. Assays specifically directed to sulfatase functions, such as assessing agonist or 
antagonist activity, encompass the use of known fragments. Further, diagnostic methods 
for assessing sulfatase function can also be practiced with any fragment, including those 
fragments that may have been known prior to the invention. Similarly, in methods 
involving treatment of sulfatase dysfunction, all fragments are encompassed mcludihg 
those, which may have been known in the art. 

Sulfatase polynucleotides are useful as a hybridization probe for cDNA and 
genomic DNA to isolate a full-length cDNA and genomic clones encoding the 
polypeptides described in SEQ ID NOS:l, 3, 5, or 7, and to isolate cDNA and genomic 
clones that correspond to variants producing the same polypeptides shown in SEQ ID 
NOS:l, 3, 5, or 7, or the other variants described herein. Variants can be isolated from 
the same tissue and organism from which a polypeptide shown in SEQ ID NOS:l, 3, 5, 
or 7 was isolated, different tissues from the same organism, or from different organisms. 
This method is useful for isolating genes and cDNA that are developmentally-controlled 
and therefore may be expressed in the same tissue or different tissues at different points 
in the development of an organism. 
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The probe can correspond to any sequence along the entire length of the gene 
encoding the sulfatase polypeptide, Accordingly, it could be derived from 5' noncoding 
regions, the coding region, and 3' noncoding regions. 

The nucleic acid probe can be, for example, the full-length cDNA ofSEQ ID 
5 NOS:2, 4, 6, 8, 1 1 , 1 2, 1 3, or 14 or a fragment thereof, such as an oligonucleotide of at 
least 5, 10, 15, 20, 25, 30, 50, 100, 250 or 500 nucleotides in length and sufficient to 
specifically hybridize under stringent conditions to mRNA or DNA. 

Fragments of the polynucleotides described herein are also useful to synthesize 
larger fragments or full-length polynucleotides described herein, ribozymes or antisense 
1 0 molecules. For example, a fragment can be hybridized to any portion of an mRNA and a 
larger or full-length cDNA can be produced. 

Antisense nucleic acids of the invention can be designed using the nucleotide 
sequences of SEQ ID NOS:2, 4, 6, 8, 1 1, 12, 13, or 14 and constructed using chemical 
synthesis and enzymatic ligation reactions using procedures known in the art For 
15 example, an antisense nucleic acid (e.g., an antisense oligonucleotide) can be 

chemically synthesized using naturally occurring nucleotides or variously modified 
nucleotides designed to increase the biological stability of the molecules or to increase 
the physical stability of the duplex formed between the antisense and sense nucleic 
acids, e.g., phosphorothioate derivatives and acridine substituted nucleotides can be 
20 used. Examples of modified nucleotides which can be used to generate the antisense 
nucleic acid include 5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouraciL 
hypoxanthine, xanthine, 4-acetylcytosine, 5-(carboxyhydroxylmethyl) uracil, 5- 
carboxyme%lammomethyl-2-lMov^ 

dihydrouracil, beta-D-galactosylqueosine, inosine, N6-isopentenyladenine, 1- 
25 methylguanine, 1 -methylinosine, 2,2-dimethylguanine, 2-methyladenine, 2- 

methylguanine, 3-methylcytosine, 5-methylcytosine, N6-adenine, 7-methylguanine, 5- 
methylaminomethyluracil, 5-methoxyaminomethyl-2-thiouracil, beta-D- 
mannosylqueosine, S'-methoxycarboxymethyluracii, 5-memoxyuracil, 2-methylthio- 
N6-isopentenyladenine, uracil-5-oxyaeetic acid (v), wybutoxosine, pseudouracii, 
30 queosine, 2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouraciI, 4-thiouracil, 5- 
methyluraciL uracil-5-oxyacetic acid methylester, uracii-5-oxyacetic acid (v), 5- 
methyl-2-thiouracil, 3-(3-amino-3-N-2-carboxypropyl) uracil, (acp3)w, and 2,6- 
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diaminopurme. Alternatively, the antisense nucleic acid can be produced biologically 
using an expression vector into which a nucleic acid has been subcloned in an 
antisense orientation (i.e., RNA transcribed from the inserted nucleic acid will be of 
an antisense orientation to a target nucleic acid of interest). 
5 Additionally, the nucleic acid molecules of the invention can be modified at 

the base moiety, sugar moiety or phosphate backbone to improve, e.g., the stability, 
hybridization, or solubility of the molecule. For example, the deoxyribose phosphate 
backbone of the nucleic acids can be modified to generate peptide nucleic acids (see 
Hyrup et al. (1 996) Bioorganic & Medicinal Chemistry 4:5). As used herein, the 

1 0 terms "peptide nucleic acids" or "PNAs" refer to nucleic acid mimics, e.g., DNA 

mimics, in which the deoxyribose phosphate backbone is replaced by a pseudopeptide 
backbone and only the four natural nucleobases are retained. The neutral backbone of 
PNAs has been shown to allow for specific hybridization to DNA and RNA under 
conditions of low ionic strength. The synthesis of PNA oligomers can be performed 

15 using standard solid phase peptide synthesis protocols as described in Hyrup et al. 
(199% supra; Perry-O'Keefe etal. (1996) Proc. Natl Acad. Set USA 93:14670. 
PNAs can be further modified, e.g., to enhance their stability, specificity or cellular 
uptake, by attaching hpophilic or other helper groups to PNA, by the formation of 
PNA-DNA chimeras, or by the use of liposomes or other techniques of drug delivery 

20 known in the art The synthesis of PNA-DNA chimeras can be performed as 

described in Hyrup (1996), supra, Finn et al. (1996) Nucleic Acids Res. 2<17):3357- 
63, Mag etal. (1989) Nucleic Acids Res. 77:5973, andPeterser etal. (1975) 
Bioorganic Med. Chem. Lett. J: 1119. 

The nucleic acid molecules and fragments of the invention can also include 

25 other appended groups such as peptides (e.g., for targeting host cell sulfetases in 

vivo), or agents facilitating transport across the cellniembrane (see, e.g., Letsinger et 
al. (1989) Proc. Natl. Acad. Set USA 55:6553-6556; Lemaitre et al (1987) Proc. 
Natl Acad. Sci. USA 54:648-652; PCT Publication No. WO 88/0918) or the blood 
brain barrier (see, e.g., PCT Publication No. WO 89/10134). In addition, 

30 oligonucleotides can be modified with hybridization-triggered cleavage agents (see, 
e.g., Krol et al. (1988) Bio-Techniques <J:958-976) or intercalating agents (see, e.g., 
Zon (1988) Pharm Res. 5:539-549). 
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Sulfatase polynucleotides are also useful as primers for PCR to amplify any 
given region of a sulfatase polynucleotide. 

Sulfatase polynucleotides are also useful for constructing recombinant vectors. 
Such vectors include expression vectors that express a portion of, or all of, the sulfatase 
5 polypeptides. Vectors also include insertion vectors, used to integrate into another 
polynucleotide sequence, such as into the cellular genome, to alter in situ expression of 
sulfatase genes and gene products. For example, an endogenous sulfatase coding 
sequence can be replaced via homologous recombination with all or part of the coding 
region containing one or more specifically introduced mutations. 

Sulfatase polynucleotides are also useful for expressing antigenic portions of 
sulfatase proteins. 

Sulfatase polynucleotides are also useful as probes for detenriining the 
chromosomal positions of sulfatase polynucleotides by means of in situ hybridization 
methods, such as FISH. (For a review of mis technique, see Verma et al. (1988) Human 
Chromosomes: A Manual of Basic Techniques (Pergamon Press, New York), and PCR 
mapping of somatic cell hybrids. The mapping of the sequences to chromosomes is an 
hstportant first step in correlating these sequences with genes associated with disease. 

Reagents for chromosome mapping can be used individually to mark a single 
chromosome or a single site on that chromosome, or panels of reagents can be used for 
marking multiple sites and/or multiple chromosomes. Reagents corresponding to 
noncoding regions of the genes actually are preferred for mapping purposes. Coding 
sequences are more likely to be conserved within gene families, thus increasing the 
chance of cross hybridizations during chromosomal mapping. 

Once a sequence has been mapped to a precise chromosomal location, the 
physical position of the sequence on the chromosome can be correlated with genetic map 
data. (Such data are found, for example, in V. McKusick, Mendelian Inheritance in 
Man, available on-line through Johns Hopkins University Welch Medical Library). The 
relationship between a gene and a disease mapped to the same chromosomal region, can 
then be identified through linkage analysis (co-inheritance of physically adjacent genes), 
described in, for example, Egeland et al ((1987) Nature 525:783-787). 

Moreover, differences in the DNA sequences between individuals affected and 
unaffected with a disease associated with a specified gene, can be determined. If a 
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mutation is observed in some or all ofthe affected individuals but not in any unaffected 
individuals, then the mutation is likely to be the causative agent ofthe particular disease. 
Comparison of affected and unaffected individuals generally involves first looking for 
structural alterations in the chromosomes, such as deletions or translocations, that are 
5 visible from chromosome spreads, or detectable using PGR based on that DNA 

sequence. Ultimately, complete sequencing of genes from several individuals can be 
performed to corifirm the presence of a mutation and to distinguish mutations from 
polymorphisms. 

Sulfatase polynucleotide probes are also useful to determine patterns ofthe 
1 0 presence ofthe gene encoding sulfatases and their variants with respect to tissue 
distribution, for example, whether gene duplication has occurred and whether the 
duplication occurs in all or only a subset of tissues. The genes can be naturally occurring 
or can have been introduced into a cell, tissue, or organism exogenously. 

Sulfatase polynucleotides are also useful for designing ribozymes corresponding 
15 to all, or apart, ofthe mKNA produced from genes encoding me polynucleotides 
described herein. 

Sulfatase polynucleotides are also useful for constructing host cells expressing a 
part, or all, of a sulfatase polynucleotide or polypeptide. 

Sulfatase polynucleotides are also useful for constructing transgenic animals 
20 expressing all, or a part, of a sulfatase polynucleotide or polypeptide. 

Sulfatase polynucleotides are also useful for making vectors that express part, or 
all, of a sulfatase polypeptide. 

Sulfatase polynucleotides are also useful as hybridization probes for detemuning 
the level of sulfatase nucleic acid expression. Accordingly, the probes can be used to 
25 detect the presence of, or to determine levels of, sulfatase nucleic acid in cells, tissues, 
and in organisms. The nucleic acid whose level is determined can be DNA or RNA. 
Accordingly, probes corresponding to the polypeptides described herein can be used to 
assess gene copy number in a given cell, tissue, or organism. This is particularly 
relevant in cases in which there has been an amplification of a sulfatase gene. 
30 Alternatively, the probe can be used in an in situ hybridization context to assess 

the position of extra copies of a sulfatase gene, as on extrachx-omosomal elements or as 



-61- 



WO 01/55411 PCT/US01/03266 
integrated into chromosomes in which the sulfatase gene is not normally found, for 
example, as a homogeneously staining region. 

These uses are relevant for diagnosis of disorders involving an increase or 
decrease in sulfatase expression relative to normal, such as a proliferative disorder, a 
5 differentiative or developmental disorder, or a hematopoietic disorder. Disorders in 
which sulfatase expression is relevant include, but are not limited to, those disclosed 
herein above. 

Disorders in which 22438 sulfatase expression is relevant include, but are not 
limited to, those involving the tissues as disclosed herein and those associated with 
10 pain. 

Disorders in which 23553 sulfatase expressionis relevant include, but are not 
limited to, breast and colon carcinoma. 

Disorders in which 25278 sulfatase expression is relevant include, but are not 
limited to, colon carcinoma. 
15 Disorders in which 26212 sulfatase expression is relevant include, but are not 

limited to, hemangioma and uterine adenocarcinoma. 

Thus, the present invention provides a method for identifying a disease or 
disorder associated with aberrant expression or activity of a sulfatase nucleic acid, in 
which a test sample is obtained from a subject and nucleic acid (e.g., mRNA, genomic 
20 DNA) is detected, wherein the presence of the nucleic acid is diagnostic for a subject 
having or at risk of developing a disease or disorder associated with aberrant expression 
or activity of the nucleic acid. 

One aspect of the invention relates to diagnostic assays for determining 
nucleic acid expression as well as activity in the context of a biological sample (e.g., 
25 blood, serum, cells, tissue) to determine whether an individual has a disease or 
disorder, or is at risk of developing a disease or disorder, associated with aberrant 
nucleic acid expression or activity. Such assays can be used for prognostic or 
predictive purpose to thereby prophylactically treat an individual prior to the onset of 
a disorder characterized by or associated with expression or activity of the nucleic 
30 acid molecules. 
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In vitro techniques for detection of mRNA include Northern hybridizations and 
in situ hybridizations. In vitro techniques for detecting DNA includes Southern 
hybridizations and in situ hybridization. 

Probes can be used as a part of a diagnostic test kit for identifying cells or tissues 
5 that express a sulfatase, such as by measuring the level of a sulfatase-encoding nucleic 
acid in a sample of cells from a subject e.g., mRNA or genomic DNA, or determining if 
me sulfetase gene has been mutated. 

Nucleic acid expression assays are useful for drug screening to identify 
compounds that modulate sulfatase nucleic acid expression (e.g., antisense, 
1 0 polypeptides, peptidomimetics, small molecules or other drugs). A cell is contacted 
with a candidate compound and the expression of mRNA determined. The level of 
expression of the mRNA in the presence of the candidate compound is compared to the 
level of expression of the mRNA in the absence of the candidate compound. The 
candidate compound can then be identified as a modulator of nucleic acid expression 
15 based on this comparison and be used, for example to treat a disorder characterized by 
aberrant nucleic acid expression. The modulator can bind to the nucleic acid or 
indirectly modulate expression, such as by mteracting with other cellular <»nirxmefits 
mat affect nucleic acid expression. 

Modulatory methods can be performed in vitro (e.g., by culturing the cell with 
20 the agent) or, alternatively, in vivo (e.g., by administering the gent to a subject) in 

patients or in transgenic animals. The invention thus provides a method for identifying a 
compound that can be used to treat a disorder associated with nucleic acid expression of 
a sulfatase gene. The method typically includes assaying the ability of the compound to 
modulate the expression of the sulfetase nucleic acid and thus identifying a compound 
25 that can be used to treat a disorder characterized by undesired sulfatase nucleic acid 
expression. 

The assays can be performed in cell-based and cell-free systems. Cell-based 
assays include cells naturally expressing the sulfatase nucleic acid or recombinant ceils 
genetically engineered to express specific nucleic acid sequences. Alternatively, 
30 candidate compounds can be assayed in vivo in patients or in transgenic animals. 

The assay for sulfatase nucleic acid expression can involve direct assay of 
nucleic acid levels, such as mRNA levels, or on collateral compounds (such as substrate 
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Nakazawa et al (1994) PWPi:360-364), the latter of which can be particularly useful 
for detecting point mutations in the gene (see Abravaya et al (1 995) Nuckie Acids Res. 
25:675-682). This method can include the steps of collecting a sample of cells from a 
patient, isolating nucleic acid (e.g., genomic, mRNA or both) from the cells of the 
5 sample, contacting the nucleic acid sample with one or more primers which specifically 
hybridize to a gene under conditions such that hybridization and amplification of the 
gene (if present) occurs, and detecting the presence or absence of an amplification 
product, or detecting the size of the amplification product and comparing the length to a 
control sample. Deletions and insertions can be detected by a change in size of the 
10 amplified product compared to the normal genotype. Point mutations can be identified 
by hybridizing amplified DNA to normal RNA or antisense DNA sequences. 

It is anticipated that PCR and/or LCR may be desirable to use as a preliminary 
amplification step in conjunction with any of the techniques used for detecting 
mutations described herein. 
15 Alternative amplification methods include: self sustained sequence replication 

(Guatelli etal. (1990) JW. Natl. Acad. Sci. USA 57:1874-1878), transcriptional 
amplification system (Kwoh et al. (1989) Proc. Natl. Acad Sci. USA 86:1 173-1 177), 
Q-Beta Replicase (Lizardi etal. (1988) Bio/Technology 6:1 197), or any other nucleic 
acid amplification method, followed by the detection of the amplified molecules using 
20 techniques well-known to those of skill in the art. These detection schemes are 
especially useful for the detection of nucleic acid molecules if such molecules are 
present in very low numbers. 

Alternatively, mutations in a sulfatase gene can be directly identified, for 
example, by alterations in restriction enzyme digestion patterns determined by gel 
25 electrophoresis. 

Further, sequence-specific ribozymes (U.S. Patent No. 5,498,531) can be used to 
score for the presence of specific mutations by development or loss of a ribozyme 
cleavage site. 

Perfectly matched sequences can be distinguished from mismatched sequences 
30 by nuclease cleavage digestion assays or by differences in melting temperature. 

Sequence changes at specific locations can also be assessed by nuclease 
protection assays such as RNase and SI protection or the chemical cleavage method. 
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Furthermore, sequence differences between a mutant sulfatase gene and a wild- 
type gene can be determined by direct DNA sequencing. A variety of automated 
sequencing procedures can be utilized when perfuming the diagnostic assays ((1995) 
Biotechniques 1 P:448), including sequencing by mass spectrometry (see, e.g., PCT 
5 International Publication No. WO 94/16101; Cohen et al. (1996) Adv. Chromatogr. 
5(5:127-162; and Griffin*/ al. (1993) AppL Biochem. Bioteehnol 55:147-159). 

Other methods for detecting mutations in the gene include methods in which 
protection from cleavage agents is used to detect mismatched bases in RNA/RNA or 
RNA/DNA duplexes (Myers et al. (1985) Science 250:1242); Gorton et al. (1988) PNAS 
10 55:4397; Saleeba et al. (1992) Meth Enzymol. 21 7:286-295), electrophoretic mobility of 
mutant and wild type nucleic acid is compared (Orita et al. (1989) PNAS '86:2766; 
Cotton et al. (1995) Mutat. Res. 255:125-144; and Hayashi et al (1992) Genet. Anal. 
Tech. Appl. 9:73-79), and movement of mutant or wild-type fragments in 
polyacrylamide gels containing a gradient of denaturant is assayed using denaturing 
gradient gel electrophoresis (Myers et al. (1985) Nature 313:495). The sensitivity of the 
assay may be enhanced by using RNA (rather than DNA), in which the secondary 
structure is more sensitive to a change in sequence. In one embodiment, the subject 
method utilizes heteroduplex analysis to separate double stranded heteroduplex 
molecules on the basis of changes in electrophoretic mobility (Keen et al. (1991) 
T rends Genet. 7:5). Examples of other techniques for detecting point mutations include, 
selective oligonucleotide hybridization, selective amplification, and selective primer 



In other embodiments, genetic mutations can be identified by hybridizing a 
sample and control nucleic acids, e.g., DNA or RNA, to high density arrays 
containing hundreds or thousands of oligonucleotide probes (Cronin et al. (1996) 
Human Mutation 7:244-255; Kozal et al. (1996) Nature Medicine 2:753-759). For 
example, genetic mutations can be identified in two dimensional arrays containing 
light-generated DNA probes as described in Cronin et al. supra. Briefly, a first 
hybridization array of probes can be used to scan through long stretches of DNA in a 
sample and control to identify base changes between the sequences by making linear 
arrays of sequential overlapping probes. This step allows the identification of point 
mutations. This step is followed by a second hybridization array that allows the 
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characterization of specific mutations by using smaller, specialized probe arrays 
complementary to all variants or mutations detected. Each mutation array is 
composed of parallel probe sets, one complementary to the wild-type gene and the 
other complementary to the mutant gene. 

Sulfatase polynucleotides are also useful for testing an individual for a genotype 
that while not necessarily causing the disease, nevertheless affects the treatment 
modality. Thus, the polynucleotides can be used to study the relationship between an 
individual's genotype and die individual's response to a compound used for treatment 
(pharmacogenomic relationship). In the present case, for example, a mutation in the 
sulfatase gene that results in altered affinity for a substrate-related compound could 
result in an excessive or decreased drug effect with standard concentrations of the 
compound. Accordingly, the sulfatase polynucleotides described herein can be used to 
assess the mutation content of the gene in an individual in order to select an appropriate 
compound or dosage regimen for treatment. 



diagnostic target that can be used to tailor treatment in an individual. Accordingly, the 
production of recombinant cells and animals containing these polymorphisms allow 



The methods can involve obtaining a control biological sample from a control 
subject, contacting the control sample with a compound or agent capable of detecting 
mRNA, or genomic DNA, such that the presence ofmRNA or genomic DNA is 
detected in the biological sample, and comparing the presence of mRNA or genomic 
DNA in the control sample with the presence ofmRNA or genomic DNA in the test 
sample. 

Sulfatase polynucleotides are also useful for chromosome identification when the 
sequence is identified with an individual chromosome and to a particular location on the 
chromosome. First, the DNA sequence is matched to the chromosome by in situ ox 
other chromosome-specific hybridization. Sequences can also be correlated to specific 
chromosomes by preparing PCR primers that can be used for PCR screening of somatic 
cell hybrids containing individual chromosomes from the desired species. Only hybrids 
containing the chromosome containing the gene homologous to the primer will yield an 
amplified fragment Sublocalization can be achieved using chromosomal fragments. 
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Other strategies include prescreening with labeled flow-sorted chromosomes and 
preselection by hybridization to chromosome-specific libraries. Further mapping 
strategies include fluorescence in situ hybridization, which allows hybridization with 
probes shorter than those traditionally used. Reagents for chromosome mapping can be 
5 used individually to mark a single chromosome or a single site on the chromosome, or 
panels of reagents can be used for marking multiple sites and/or multiple chromosomes. 
Reagents corresponding to noncoding regions of the genes actually are preferred for 
mapping purposes. Coding sequences are more likely to be conserved within gene 
families, thus increasing the chance of cross hybridizations during chromosomal 
10 mapping. 

Sulfatase polynucleotides can also be used to identify individuals from small 
biological samples. This can be done for example using restriction fragment-length 
polymorphism (RFLP) to identify an individual. Thus, the polynucleotides described 
herein are useful as DNA markers for RFLP (See U.S. Patent No. 5,272,057). 

1 5 Furthermore, the sulfatase sequences can be used to provide an alternative 

technique, which determines the actual DNA sequence of selected fragments in the 
genome of an individual. Thus, the sulfatase sequences described herein can be used to 
prepare two PCR primers from the 5' and 3' ends of the sequences. These primers can 
then be used to amplify DNA from an individual for subsequent sequencing. 

20 Panels of corresponding DNA sequences from individuals prepared in this 

manner can provide unique individual identifications, as each individual will have a 
unique set of such DNA sequences. It is estimated that allelic variation in humans 
occurs with a frequency of about once per each 500 bases. Allelic variation occurs to 
some degree in the coding regions of these sequences, and to a greater degree in the 

25 noncoding regions. Sulfatase sequences can be used to obtain such identification 

sequences from individuals and from tissue. The sequences represent unique fragments 
of the human genome. Each of the sequences described herein can, to some degree, be 
used as a standard against which DNA from an individual can be compared for 
identification purposes. 

30 If a panel of reagents from the sequences is used to generate a unique 

identification database for an individual, those same reagents can later be used to identify 
tissue from that individual. Using the unique identification database, positive 
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identification of the individual, living or dead, can be made from extremely small tissue 
samples. 

Sulfatase polynucleotides can also be used in forensic identification procedures. 
PCR technology can be used to amplify DNA sequences taken from very small 
bioiogical samples, such as a single hair follicle, body fluids (e.g. blood, saliva, or 
semen). The amplified sequence can then be compared to a standard allowing 
identification of the origin of the sample. 

Sulfatase polynucleotides can thus be used to provide polynucleotide reagents, 
e.g., PCR primers, targeted to specific loci in the human genome, which can enhance the 
reliability of DNA-based forensic identifications by, for example, providing another 
"identification marker" (i.e. another DNA sequence that is unique to a particular 
individual). As described above, actual base sequence information can be used for 
identification as an accurate alternative to patterns formed by restriction enzyme 
generated fragments. Sequences targeted to the noncoding region are particularly useful 
since greater polymorphism occurs in the noncoding regions, making it easier to 
differentiate individuals using this technique. 

Sulfatase polynucleotides can further be used to provide polynucleotide reagents, 
feg,, labeled or labelable probes which can be used in, for example, an in situ 
hybridization technique, to identify a specific tissue. This is useful in cases in which a 
forensic pathologist is presented with a tissue of unknown origin. Panels of sulfatase 
probes can be used to identify tissue by species and/or by organ type. 

In a similar fashion, these primers and probes can be used to screen tissue culture 
for contamination (i.e. screen for the presence of a mixture of different types of cells in a 
culture). 

Alternatively, sulfatase polynucleotides can be used directly to block 
transcription or translation of sulfatase gene sequences by means of antisense or 
ribozyme constructs. Thus, in a disorder characterized by abnormally high or 
undesirable sulfatase gene expression, nucleic acids can be directly used for treatment. 

Sulfatase polynucleotides are thus useful as antisense constructs to control 
sulfatase gene expression in cells, tissues, and organisms. A DNA antisense 
polynucleotide is designed to be complementary to a region of the gene involved in 
transcription, preventing transcription and hence production of sulfatase protein. An 
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antisense RNA or DNA polynucleotide would hybridize to the mRNA and thus block 
translation of mRNA into sulfatase protein. 

Examples of antisense molecules useful to inhibit nucleic acid expression include 
antisense molecules complementary to a fragment of the 5' untranslated region of SEQ 
5 ID NOS:2, 4, 6, or 8, which also includes the start codon and antisense molecules which 
are complementary to a fragment of the 3' untranslated region of SEQ ID NOS:2, 4, 6, or 
8. 

Alternatively, a class of antisense molecules can be used to inactivate mRNA in 
order to decrease expression of sulfatase nucleic acid. Accordingly, these molecules can 
10 treat a disorder characterized by abnormal or undesired sulfatase nucleic acid expression. 
This technique involves cleavage by means of ribozymes containing nucleotide 
sequences complementary to one or more regions in the mRNA that attenuate the ability 
of the mRNA to be translated. Possible regions include coding regions and particularly 
coding regions corresponding to the catalytic and other functional activities of the 

15 



containing cells that are aberrant in sulfatase gene expression. Thus, recombinant cells, 
Which include the patient's cells that have been engineered ex vivo and returned to the 
patient, are introduced into an individual where the cells produce the desired sulfatase 

20 protein to treat the individual. 

The invention also encompasses kits for detecting the presence of a sulfatase 
nucleic acid in a biological sample. For example, the kit can comprise reagents such as a 
labeled or labelable nucleic acid or agent capable of detecting sulfatase nucleic acid in a 
biological sample; means for determining the amount of sulfatase nucleic acid in the 

25 sample; and means for comparing the amount of sulfatase nucleic acid in the sample 
with a standard. The compound or agent can be packaged in a suitable container. The 
kit can further comprise instructions for using the kit to detect sulfatase mRNA or DNA. 



30 The nucleotide or amino acid sequences of the invention are also provided ii 

variety of mediums to facilitate use thereof: As used herein, "provided" refers to a 
manufacture, other than an isolated nucleic acid or amino acid molecule, which 
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contains a nucleotide or amino acid sequence of the present invention. Such a 
manufacture provides the nucleotide or amino acid sequences, or a subset thereof 
(e.g., a subset of open reading frames (ORFs)) in a form which allows a skilled artisan 
to examine the manufacture using means not directly applicable to examining the 

5 nucleotide or amino acid sequences, or a subset thereof, as they exists in nature or in 
purified form. 

In one application of this embodiment, a nucleotide or amino acid sequence of 
the present invention can be recorded on computer readable media. As used herein, 
"computer readable media" refers to any medium that can be read and accessed 

10 directly by a computer. Such media include, but are not limited to: magnetic storage 
media, such as floppy discs, hard disc storage medium, and magnetic tape; optical 
storage media such as CD-ROM; electrical storage media such as RAM and ROM; 
and hybrids of these categories such as magnetic/optical storage media. The skilled 
artisan will readily appreciate how any of the presently known computer readable 

15 mediums can be used to create a manufacture comprising computer readable medium 
having recorded thereon a nucleotide or amino acid sequence of the present invention. 

As used herein, "recorded" refers to a process for storing information on 
computer readable medium. The skilled artisan can readily adopt any of the presently 
known methods for recording information on computer readable medium to generate 

20 manufactures comprising the nucleotide or amino acid sequence information of the 
present invention. 

A variety of data storage structures are available to a skilled artisan for 
creating a computer readable medium having recorded thereon a nucleotide or amino 
acid sequence of the present invention. The choice of the data storage shxicture will 

25 generally be based on the means chosen to access the stored information. In addition, 
a variety of data processor programs and formats can be used to store the nucleotide 
sequence information of the present invention on computer readable medium. The 
sequence information can be represented in a word processing text file, formatted in 
commercially-available software such as WordPerfect and Microsoft Word, or 

30 represented in the form of an ASCII file, stored in a database application, such as 
DB2, Sybase, Oracle, or the like. The skilled artisan can readily adapt any number of 
dataprocessor structuring formats (e.g., text file or database) in order to obtain 
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computer readable medium having recorded thereon the nucleotide sequence 
information of the present invention. 

By providing the nucleotide or amino acid sequences of the invention in 
computer readable form, the skilled artisan can routinely access the sequence 
5 information for a variety of purposes. For example, one skilled in the art can use the 
nucleotide or amino acid sequences of the invention in computer readable form to 
compare a target sequence or target structural motif with the sequence information 
stored within the data storage means. Search means are used to identify fragments or 
regions of the sequences of the invention which match a particular target sequence or 
10 target motif. 

As used herein, a "target sequence" can be any DNA or amino acid sequence 
of six or more nucleotides or two or more amino acids. A skilled artisan can readily 
recognize that the longer a target sequence is, the less likely a target sequence will be 
present as a random occurrence in the database. The most preferred sequence length 

15 of a target sequence is from about 10 to 100 amino acids or from about 30 to 300 
nucleotide residues. However, it is well recognized that commercially important 
fragments, such as sequence fragments involved in gene expression and protein 
processing, may be of shorter length. 

As used herein, "a target structural motif," or "target motif," refers to any 

20 rationally selected sequence or combination of sequences in which the sequence^) are 
chosen based on a three-dimensional configuration which is formed upon the folding 
of the target motif. There are a variety of target motifs known in the art Protein 
target motifs include, but are not limited to, enzyme active sites and signal sequences. 
Nucleic acid target motifs include, but are not limited to, promoter sequences, hairpin 

25 structures and inducible expression elements (protein binding sequences). 

Computer software is publicly available which allows a skilled artisan to 
access sequence information provided in a computer readable medium for analysis 
and comparison to other sequences. A variety of known algorithms are disclosed 
publicly and a variety of commercially available software for conducting search 

30 means are and can be used in the computer-based systems of the present invention. 
Examples of such software includes, but is not limited to, MacPattern (EMBL), 
BLASTN and BLASTX (NCBIA). 
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For example, software which implements the BLAST (Altschul et ah (1990) J. 
Mol. Biol. 2iJ:403-410) and BLAZE (Bruflag et ah (1993) Comp. Chem. 77:203-207) 
search algorithms on a Sybase system can be used to identify open reading frames 
(ORFs) of the sequences of the invention which contain homology to ORFs or 
5 proteins from other libraries. Such ORFs are protein encoding fragments and are 
useful in producing commercially important proteins such as enzymes used in various 
reactions and in the production of commercially useful metabolites. 

Vectors/Host Cells 

1 0 The invention also provides vectors containing sulfatase polynucleotides. The 

term 'Vector" refers to a vehicle, preferably a nucleic acid molecule that can transport 
sulfatase polynucleotides. When the vector is a nucleic acid molecule, the sulfatase 
polynucleotides are covalently linked to the vector nucleic acid. With this aspect of the 
invention, the vector includes a plasmid, single or double stranded phage, a single or 
15 double stranded RNA or DNA viral vector, or artificial chromosome, such as a BAC, 
PAC, YAC, OR MAC. 

A vector can be maintained in the host cell as an extrachromosomal element 
where it replicates and produces additional copies of sulfatase polynucleotides. 
Alternatively, the vector may integrate into the host cell genome and produce additional 
20 copies of sulfatase polynucleotides when the host cell replicates. 

The invention provides vectors for the maintenance (cloning vectors) or vectors 
for expression (expression vectors) of sulfatase polynucleotides. The vectors can 
function in prokaryotic or eukaryotic cells or in both (shuttle vectors). 

Expression vectors contain cis-acting regulatory regions that are operabiy linked 
25 in the vector to sulfatase polynucleotides such mat transcription of the polynucleotides is 
allowed in a host cell. The polynucleotides can be introduced into the host cell with a 
separate polynucleotide capable of affecting transcription. Thus, the second 
polynucleotide may provide a trans-acting factor interacting with the cis-regulatory 
control region to allow transcription of sulfatase polynucleotides from the vector. 
30 Alternatively, a trans-acting factor may be supplied by the host cell. Finally, a trans- 
acting factor can be produced from the vector itself. 
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The regulatory sequence may provide constitutive expression in one or more host 
cells (i.e. tissue specific) or may provide for inducible expression in one or more cell 
types such as by temperature, nutrient additive, or exogenous factor such as a hormone 
or other ligand. A variety of vectors providing for constitutive and inducible expression 
5 in prokaryotic and eukaryotic hosts are well known to those of ordinary skill in the art. 

Sulfatase polynucleotides can be inserted into the vector nucleic acid by well- 
known methodology. Generally, the DNA sequence that will ultimately be expressed is 
joined to an expression vector by cleaving the DNA sequence and the expression vector 
with one or more restriction enzymes and then ligating the fragments together. 
1 0 Procedures for restriction enzyme digestion and ligation are well known to those of 
ordinary skill in the art 

The vector containing the appropriate polynucleotide can be introduced into an 
appropriate host cell for propagation or expression using well-known techniques. 
Bacterial cells include, but are not limited to, R coli, Sireptomyces, and Salmonella 
typhimurium. Eukaryotic cells include, but are not limited to, yeast, insect cells such as 
Brosophila, animal cells such as COS and CHO cells, and plant cells. 

As described herein, it may be desirable to express the polypeptide as afusion 
protein. Accordingly, the invention provides fusion vectors that allow for the production 
Of sulfatase polypeptides. Fusion vectors can increase the expression of a recombinant 
protein, increase the solubility of the recombinant protein, and aid in the purification of 
the protein by acting for example as a ligand for affinity purification. A proteolytic 
cleavage site may be introduced at the junction of the fusion moiety so that the desired 
polypeptide can ultimately be separated from the fusion moiety. Proteolytic enzymes 
include, but are not limited to, factor Xa, thrombin, and enterokinase. Typical fusion 
expression vectors include pGEX (Smiths/. (1988) Gem 67:31-40), pMAL (New 
England Biolabs, Beverly, MA) and pRIT5 (Pharmacia, Piscataway, NJ) which fuse 
glutathione S-transferase (GST), maltose E binding protein, or protein A respectively, to 
the target recombinant protein. Examples of suitable inducible non-fusion K coli 
expression vectors include pTrc (Amann etal (1988) Gene 69:301-315) and pET lid 
(Studier et al. (1990) Gem Expression Technology: Methods in Enzymology /&5:60-89). 

Recombinant protein expression can be maximized in a host bacteria by 
providing a genetic background wherein the host cell has an impaired capacity to 
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proteolyticaUy cje ave the recombinant protein. (Gottesman, S. (1990) Gene Expression 
Technology: Methods in Enzymology 185> Academic Press, San Diego, California 119- 
128). 

It is farther recognized that the nucleic acid sequences of the invention can be 
altered to contain codons, which are preferred, or non preferred, for a particular 
expression system. For example, the nucleic acid can be one in which at least one 
altered codon, and preferably at least 1 0%, or 20% of the codons have been altered 
such that the sequence is optimized for expression in E. coli, yeast, human, insect, or 
CHO cells. Methods for detennining such codon usage are well known in the art. 

Suifatase polynucleotides can also be expressed by expression vectors that are 
operative in yeast. Examples of vectors for expression in yeast e.g., & cerevisiae include 
pYepSecl (Baldari et al. (1987)EMB0J. 6:229-234 ),pMFa(Kugane/ a/. (1982) Ce// 
50:933-943), pJRY88 (Schuttz etal (1987) Gene 5*113-123), andpYES2 (mvitrogen 
Corporation, San Diego, CA). 

Suifatase polynucleotides Can also be expressed in insect cells using, for 
example, baculovirus expression vectors. Baculovirus vectors available for expression 
of proteins in cultured insect cells (e.g., Sf 9 cells) include the pAc series (Smith er»J. 
(\m)Mol CeUBiol. 5:2156-2165) and the pVL series (LucUowetaL (l989)Virdhgy 
770:31-39). 

In certain embodiments of the invention, the polynucleotides described herein are 
expressed in mammalian cells using mammalian expression vectors. Examples of 
mammalian expression vectors include pCDM8 (Seed, B. (1987) Nature 52P:840) and 
pMT2PC (Kaufman etal (1987) EMBOJ. 6:187-195). 

The expression vectors listed herein are provided by way of example only of the 
well-known vectors available to those of ordinary skill in the art that would be useful to 
express suifatase polynucleotides. The person of ordinary skill in the art would be aware 



polynucleotides described herein. These are found for example in Sambrook et al. 
(1989) Molecular Cloning: A Laboratory Manual 2nd, ed, Cold Spring Harbor 
30 Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY 

The invention also encompasses vectors in which the nucleic acid sequences 
described herein are cloned into the vector in reverse orientation, but operably linked to a 
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regulatory sequence that permits transcription of antisense RNA. Thus, an antisense 
transcript can be produced to all, or to aportion, of the polynucleotide sequences 
described herein, including both coding and non-coding regions. Expression of this 
antisense RNA is subject to each of the parameters described above in relation to 
expression of the sense RNA (regulatory sequences, constitutive or inducible expression, 



The invention also relates to r< 
described herein. Host cells therefore include prokaryotic cells, lower eukaryotie cells 
such as yeast, other eukaryotie cells such as insect cells, and higher eukaryotie cells such 
as mammalian cells. 

The recombinant host cells are prepared by introducing the vector constructs 
described herein into the cells by techniques readily available to the person of ordinary 
skill in the art These include, but are not limited to, calcium phosphate transfection, 



eleetroporation, transduction, infection, lipofection, and other techniques such as tiiose 
found in Sambrook et al (Molecular Cloning: A Laboratory Manual, 2ded., Cold 
Spring Harbor Laboratory, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 
NY). 

Host cells can contain more than one Vector. Thus, different nucleotide 
sequences can be introduced on different vectors of the same cell. Similarly, sulfetase 
polynucleotides can be introduced either alone or with other polynucleotides that are not 
related to sulfatase polynucleotides such as those providing trans-acting factors for 
expression vectors. When more than one vector is introduced into a cell, the vectors can 
be introduced independently, co-introduced or joined to the sulfatase polynucleotide 
vector. 

In the case of bacteriophage and viral vectors, these can be introduced into cells 



Viral vectors can be replication-competent or rephcation-defective. In the case in which 
viral replication is defective, replication will occur in host cells providing functions that 



subpopulation of cells that contain the recombinant vector constructs. The marker can 
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be contained in the same vector that contains the polynucleotides described herein or 
may be on a separate vector. Markers include tetracycline or ampicillin-resistance genes 
for prokaryotic host cells and dihydrofolate reductase or neomycin resistance for 
eukaryotic host cells. However, any marker that provides selection for aphenotypic trait 
5 will be effective. 

While the mature proteins can he produced in bacteria, yeast, mammalian cells, 
and other cells under the control of the appropriate regulatory sequences, cell-free 
transcription and translation systems can also be used to produce these proteins using 
KNA derived from the DNA constructs described herein. 
1 0 Where secretion of the polypeptide is desired, appropriate secretion signals are 

incorporated into the vector. The signal sequence can be endogenous to the sulfatase 
polypeptides or heterologous to these polypeptides. 

Where the polypeptide is not secreted into the medium, the protein can be 
isolated from the host cell by standard disruption procedures, including freeze thaw, 
sonication, mechanical disruption, use of tysing agents and the like. The polypeptide can 
then be recovered and purified by well-known purification methods including 
ammonium sulfate precipitation, acid extraction, anion or cationic exchange 
chromatography, phosphocellulose chromatography, hydrophobic-interaction 
chromatography, affinity chromatography, hydroxylapatite chromatography, lectin 
chromatography, or high performance liquid chromatography. 

It is also understood that depending upon the host cell in recombinant production 
of the polypeptides described herein, the polypeptides can have various glycosylation 
patterns, depending upon the cell, or maybe non-giycosylated as when produced in 
bacteria. In addition, the polypeptides may include an initial modified methionine in 
some cases as a result of a host-mediated process. 

Uses of Vectors and Host Cells 

It is understood that "host cells" and "recombinant host cells" refer not only to 
the particular subject cell but also to the progeny or potential progeny of such a cell. 
Because certain modifications may occur in succeeding generations due to either 
mutation or environmental influences, such progeny may not, in fact, be identical to 
the parent cell, but are still included within the scope of the term as used herein. A 
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part of a treatment modality, it may benecessafy to effectively inactivate the substrate or 
substrate analog at a specific point in treatment Providing cells that compete for the 
molecule, but which cannot be affected by sulfatase activation would be beneficial. 

Homologously recombinant host cells can also be produced that allow the in situ 
5 alteration of endogenous sulfatase polynucleotide sequences in a host cell genome. The 
host cell includes, but is not limited to, a stable cell line, cell in vivo, or cloned 
microorganism. This technology is more fully described in WO 93/09222, WO 
91/12650, WO 91/06667, U.S. 5,272,071, and U.S. 5,641,670. Briefly, specific 
polynucleotide sequences corresponding to the sulfatase polynucleotides or sequences 
1 0 proximal or distal to a sulfatase gene are allowed to integrate into a host cell genome by 
homologous recombination where expression of the gene can be affected. In one 
embodiment, regulatory sequences are introduced that either increase or decrease 
expression of an endogenous sequence. Accordingly, a sulfatase protein can be 
produced in a cell not normally producing it Alternatively, increased expression of 
1 5 sulfatase protein can be effected in a cell normally producing the protein at a specific 
level. Further, expression can be decreased or eliminated by introducing a specific 
regulatory sequence. The regulatory sequence can be heterologous to the sulfatase 
protein sequence or can be a homologous sequence with a desired mutation that affects 
expression. Alternatively, me entire gene can be deleted. The regulatory sequence can 
20 be specific to the host cell or capable of functioning in more than one cell type. Still 
further, specific mutations can be introduced into any desired region of the gene to 
produce mutant sulfatase proteins. Such mutations could be introduced, for example, 
into the specific functional regions such as the peptide substrate-binding site. 

In one embodiment, the host cell can be a fertilized oocyte or embryonic stem 
25 cell that can be used to produce a transgenic animal containing the altered sulfatase gene. 
Alternatively, the host cell can be a stem cell or other early tissue precursor that gives 
rise to a specific subset of cells and can be used to produce transgenic tissues in an 
animal. See also Thomas et cd., Cell 57:503 (1987) for a description of homologous 
recombination vectors. The vector is introduced into an embryonic stem cell line (e.g., 
30 by electroporation) and cells in which the introduced gene has homologously 

recombined with the endogenous sulfatase gene is selected (see e.g., Li, E. et at (1992) 
Cell 69:91 5). The selected cells are then injected into a blastocyst of an animal (e.g., a 
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Methods for generating transgenic animals via embryo manipulation and 
microinjection, particularly animals such as mice, have become conventional in the art 
and are described, for example, in U.S, Patent Nos. 4,736,866 and 4,870,009, both by 
Leder et al, U.S. Patent No. 4,873,191 by Wagner et til and in Hogan, B., Manipulating 
5 the Mouse Embryo, (Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y., 
1 986). Similar methods are used for production of other transgenic animals. A 
transgenic founder animal can be identified based upon the presence of the transgene in 
its genome and/or expression of transgenic mRNA in tissues or cells of the animals. A 
transgenic founder animal can then be used to breed additional animals carrying the 
1 0 transgene. Moreover, transgenic animals carrying a transgene can further be bred to 
other transgenic animals carrying other transgenes. A transgenic animal also includes 
animals in which the entire animal or tissues in the animal have been produced using the 
homologously recombinant host cells described herein. 

In another embodiment, transgenic non-human animals can be produced which 
15 contain selected systems, which allow for regulated expression of the transgene. One 
example of such a system is the cre/loxP recombinase system of bacteriophage PI . For 
a description of the ere/loxP recombinase system, see, e.g., Lakso et al (1992) PNAS 
#;6232-6236. Another example of a recombinase system is the FLP recombinase 
system ofS. cerevisiae (O'Gorraan etal (1991) Science 25.7:1351-1355. ISzcre/loxP 
20 recombinase system is used to regulate expression of the transgene, animals containing 
transgenes encoding both the Cre recombinase and a selected protein is required. Such 
animals can be provided through the construction of "double" transgenic animals, e.g., 
by mating two transgenic animals, one containing a transgene encoding a selected 
protein and the other containing a transgene encoding a recombinase. 
25 Clones of the non-human transgenic animals described herein can also be 

produced according to the methods described in Wilmut et al (1997) Nature 555:810- 
813 and PCT International Publication Nos. WO 97/07668 and WO 97/07669. In brief, 
a cell, e.g., a somatic cell, from the transgenic animal can be isolated and induced to exit 
the growth cycle and enter G e phase. The quiescent cell can then be fused, e.g., through 
30 the use of electrical pulses, to an enucleated oocyte from an animal of the same species 
from which the quiescent cell is isolated. The reconstructed oocyte is then cultured such 
that it develops to morula or blastocyst and then transferred to a pseudopregnant female 
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foster animal The offspring bora of this female animal will be a cione of the animal 
from which the cell, e.g., the somatic cell, is isolated. 

Transgenic animals containing recombinant cells that express the polypeptides 
described herein are useful to conduct the assays described herein in an in vivo context 
5 Accordingly, the various physiological factors that are present in vivo and that could 
affect binding or activation, may not be evident from in vitro cell-free or cell-based 
assays. Accordingly, it is useful to provide non-human transgenic animals to assay in 
vivo sulfatase function, including peptide interaction, the effect of specific mutant 
sulfatases on sulfatase function and peptide interaction, and the effect of chimeric 

1 0 sulfatases. It is also possible to assess the effect of null mutations, that is mutations that 
substantially or completely eliminate one or more sulfatase functions. 

In general, methods for producing transgenic animals include introducing a 
nucleic acid sequence according to the present invention, the nucleic acid sequence 
capable of expressing the protein in a transgenic animal, into a cell in culture or in 

15 vivo. When introduced in vivo, the nucleic acid is introduced into an intact organism 
such that one or more cell types and, accordingly, one or more tissue types, express 
the nucleic acid encoding the protein. Alternatively, the nucleic acid can be 
introduced into virtually all cells in an organism by transfecting a cell in culture, such 
as an embryonic stem cell, as described herein for the production of transgenic 

20 animals, and this cell can be used to produce an entire transgenic organism. As 

described, in a further embodiment, the host cell can be a fertilized oocyte. Such cells 
are then allowed to develop in a female foster animal to produce the transgenic 

25 Pharmaceutical Compositions 

Sulfatase nucleic acid molecules, proteins, modulators of the protein, and 
antibodies (also referred to herein as "active compounds") can be incorporated into 
pharmaceutical compositions suitable for administration to a subject, e.g., a human. 
Such compositions typically comprise the nucleic acid molecule, protein, modulator, or 
30 antibody and a pharmaceutically acceptable carrier. 

The term "administer" is used in its broadest sense and includes any method of 
introducing the compositions of the present invention into a subject. This includes 
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producing polypeptides or polynucleotides in vivo by in vivo transcription or translation 
of polynucleotides that have been exogenously introduced into a subject. Thus, 
polypeptides or nucleic acids produced in the subject from the exogenous compositions 
are encompassed in the term "administer.'' 
5 As used herein the language "pharmaceutically acceptable carrier" is intended to 

include any and all solvents, dispersion media, coatings, antibacterial and antifungal 
agents, isotonic and absorption delaying agents, and the like, compatible with 
pharmaceutical administration. The use of such media and agents for pharmaceutical^ 
active substances is well known in the art. Except insofar as any conventional media or 
0 agent is incompatible with the active compound, such media can be used in the 
compositions of the invention. Supplementary active compounds can also be 
incorporated into the compositions. A pharmaceutical composition of the invention is 
formulated to be compatible with its intended route of administration. Examples of 
routes of administration include parenteral, e.g., intravenous, intradermal, subcutaneous, 
5 oral (e,g., inhalation), transdermal (topical), transmucosal, and rectal administration. 
Solutions or suspensions used for parenteral, intradermal, or subcutaneous application 
can include the following components: a sterile diluent suchas water for injection, saline 
solution, fixed oils, polyethylene glycols, glycerine, propylene glycol or other synthetic 
solvents; antibacterial agents such as benzyl alcohol ormethyl parabens; antioxidants 

0 such as ascorbic acid or sodium bisulfite; chelating agents such as 
emylenediannnetetraacetic acid; buffers such as acetates, citrates or phosphates and 
agents for the adjustment of tonicity such as sodium chloride or dextrose. pH can be 
adjusted with acids or bases, such as hydrochloric acid or sodium hydroxide. The 
parenteral preparation can be enclosed in ampules, disposable syringes or multiple dose 

> vials made of glass or plastic. 

Pharmaceutical compositions suitable for injectable use include sterile aqueous 
solutions (where water soluble) or dispersions and sterile powders for the 
extemporaneous preparation of sterile injectable solutions or dispersion. For intravenous 
administration, suitable carriers include physiological saline, bacteriostatic water, 

1 Cremophor EL™ (BASF, Parsippany, NJ) or phosphate buffered saline (PBS). In all 
cases, the composition must be sterile and should be fluid to the extent that easy 
syringability exists. It must be stable under the conditions of manufacture and storage 
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and must be preserved against the contaminating action of microorganisms such as 
bacteria and fungi. The carrier can be a solvent or dispersion medium containing, for 
example, water, ethanol, polyol (for example, glycerol, propylene glycol, and liquid 
polyethylene glycol, and the like), and suitable mixtures thereof. The proper fluidity can 
5 be maintained, for example, by the use of a coating such as lecithin, by the maintenance 
of the required particle size in the case of dispersion and by the use of surfactants. 
Prevention of the action of microorganisms can be achieved by various antibacterial and 
antifungal agents, for example, parabens, chlorobutanol, phenol, ascorbic acid 
Ihimerosal, and the like. In many cases, it will be preferable to include isotonic agents, 
1 0 for example, sugars, polyalcohols such as mannitol, sorbitol, sodium chloride in the 
composition. Prolonged absorption of the injectable compositions can be brought about 
by including in the composition an agent which delays absorption, for example, 
aluminum monostearate and gelatin. 

Sterile injectable solutions can be prepared by incorporating the active 
15 compound (e.g., a sulfatase protein or anti-sulfatase antibody) in the required amount in 
an appropriate solvent with one or a combination of ingredients enumerated above, as 
requited, followed by filtered sterilization. Generally, dispersions are prepared by 
incorporating the active compound into a sterile vehicle which contains a basic 
dispersion medium and the required other ingredients from those enumerated above. In 
20 the case of sterile powders for the preparation of sterile injectable solutions, the preferred 
methods of preparation are vacuum drying and freeze-drying which yields a powder of 
the active ingredient plus any additional desired ingredient from a previously sterile- 
filtered solution thereof. 

Oral compositions generally include an inert diluent or an edible carrier. They 
25 can be enclosed in gelatin capsules or compressed into tablets. For oral administration, 
the agent can be contained in enteric forms to survive the stomach or further coated or 
mixed to be released in a particular region of the GI tract by known methods. For the 
purpose of oral therapeutic administration, the active compound can be incorporated 
with excipients and used in the form of tablets, troches, or capsules. Oral compositions 
30 can also be prepared using a fluid carrier for use as a mouthwash, wherein the compound 
in the fluid carrier is applied orally and swished and expectorated or swallowed. 
Pharmaceutically compatible binding agents, and/or adjuvant materials can be included 
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as part of the composition. The tablets, pills, capsules, troches and the like can contain 
any of the following ingredients, or compounds of a similar nature: a binder such as 
microcrystalline cellulose, gum tragacanth or gelatin; an excipient such as starch or 
lactose, a disintegrating agent such as alginic acid, Primogel, or corn starch; a lubricant 
5 such as magnesium stearate or Sterotes; a glidant such as colloidal silicon dioxide; a 
sweetening agent such as sucrose or saccharin; or a flavoring agent such as peppeimint, 
methyl salicylate, or orange flavoring. 

For administration by inhalation, the compounds are delivered in the form of an 
aerosol spray from pressured container or dispenser, which contains a suitable 
10 propellant, e.g., a gas such as carbon dioxide, or a nebulizer. 

Systemic administration can also be by transmucosal or transdermal means. For 
transmucosal or transdermal acmunistration, penetrants appropriate to the barrier to be 
permeated are used in the formulation. Such penetrants are generally known in the art, 
and include, for example, for transmucosal adraimstration, detergents, bile salts, and 
15 fusidic acid derivatives. Transmucosal administration can be accomplished through the 
use of nasal sprays or suppositories. For transdermal administration, the active 
compounds are formulated into ointments, salves, gels, or creams as generally known in 
the art. 

The compounds can also be prepared in the form of suppositories (e.g., with 
20 conventional suppository bases such as cocoa butter and other glycerides) or retention 

enemas for rectal delivery. 

In one embodiment, the active compounds are prepared with carriers that will 

protect the compound against rapid elimination from the body, such as a controlled 

release formulation, including implants and microencapsulated delivery systems. 
25 Biodegradable, biocompatible polymers can be used, such as ethylene vinyl acetate, 

polyanhydrides, polyglycolic acid, collagen, polyorthoesters, and polylactic acid. 

Methods for preparation of such formulations will be apparent to those skilled in the art. 

The materials can also be obtained commercially from Alza Corporation and Nova 

Pharmaceuticals, Inc. Liposomal suspensions (including liposomes targeted to infected 
30 cells with monoclonal antibodies to viral antigens) can also be used as pharmaceutically 

acceptable carriers. These can be prepared according to methods known to those skilled 

in the art, for example, as described in U.S. Patent No. 4,522,8 1 1 . 
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It is especially advantageous to fonntdate oral or parenteral compositions in 
dosage unit form for ease of administration and uniformity of dosage. "Dosage unit 
form" as used herein refers to physically discrete units suited as unitary dosages for the 
subject to be treated; each unit containing a predetermined quantity of active compound 
calculated to produce the desired therapeutic effect in association with the required 
pharmaceutical carrier. The specification for the dosage unit forms of the invention are 
dictated by and directly dependent on the unique characteristics of the active compound 
and the particular therapeutic effect to be achieved, and the limitations inherent in the art 
of compounding such an active compound for the treatment of individuals. 

The nucleic acid molecules of the invention can be inserted into vectors and used 
as gene therapy vectors. Gene therapy vectors can be delivered to a subject by, for 
example, intravenous injection, local administration (U.S. 5,328,470) or by stereotactic 
injection (see e.g., Chen et al (1994) &AS 91 -.3054-3057). The pharmaceutical 
preparation of the gene therapy vector can include the gene therapy vector in an 
acceptable diluent, or can comprise a slow release matrix in which the gene delivery 
vehicle is imbedded. Alternatively, where the complete gene delivery vector can be 
produced intact from recombinant cells, e.g. retroviral vectors, the pharmaceutical 
preparation can include one or more cells which produce the gene delivery system. 

As defined herein, a therapeutically effective amount of protein or polypeptide 
(i.e., an effective dosage) ranges from about 0.001 to 30 rng/kg body weight, 
preferably about 0.01 to 25 mg/kg body weight, more preferably about 0.1 to 20 
mg/kg body weight, and even more preferably about 1 to 10 mg/kg, 2 to 9 mg/kg, 3 to 
8 mg/kg, 4 to 7 mg/kg, or 5 to 6 mg/kg body weight. 

The skilled artisan will appreciate that certain factors may influence the 
dosage required to effectively treat a subject, including but not limited to the severity 
of the disease or disorder, previous treatments, the general health and/or age of the 
subject, and other diseases present Moreover, treatment of a subject with a 
therapeutically effective amount of a protein, polypeptide, or antibody can include a 
single treatment or, preferably, can include a series of treatments. In a preferred 
example, a subject is treated with antibody, protein, or polypeptide in the range of 
between about 0.1 to 20 mg/kg body weight, one time per week for between about 1 
to 10 weeks, preferably between 2 to 8 weeks, more preferably between about 3 to 7 
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weeks, and even more preferably for about 4, 5, or 6 weeks. It will also be 
appreciated that the effective dosage of antibody, protein, or polypeptide used for 
treatment may increase or decrease over the course of a particular treatment. Changes 
in dosage may result and become apparent from the results of diagnostic assays as 



activity. An agent may, for example, be a small molecule. For example, such small 
molecules include, but are not limited to, peptides, peptidornimetics, amino acids, 
amino acid analogs, polynucleotides, polynucleotide analogs, nucleotides, nucleotide 
analogs, organic or inorganic compounds (i.e., including heteroorganic and 
organometallic compounds) having a molecular weight less than about 10,000 grams 
per mole, organic or inorganic compounds having a molecular weight less than about 
5,000 grams per mole, organic or inorganic compounds having a molecular weight 
less than about 1,000 grams per mole, organic or inorganic compounds having a 
molecular weight less than about 500 grams per mole, and salts, esters, and other 
pharmaceutically acceptable forms of such compounds. 

It is understood that appropriate doses of small molecule agents depends upon 
a number of factors within the ken of the ordinarily skilled physician, veterinarian, or 
researcher. The dose(s) of the small molecule will vary, for example, depending upon 
the identity, size, and condition of the subject or sample being treated, further 
depending upon the route by which the composition is to be aoministered, if 
applicable, and the effect which the practitioner desires the small molecule to have 
upon the nucleic acid or polypeptide of the invention. Exemplary doses include 
milligram or microgram amounts of the small molecule per kilogram of subject or 
sample weight (e.g., about 1 microgram per kilogram to about 500 milligrams per 
kilogram, about 100 micrograms per kilogram to about 5 milligrams per kilogram, or 
about 1 microgram per kilogram to about 50 micrograms per kilogram. It is 
furthermore understood that appropriate doses of a small molecule depend upon the 
potency of the small molecule with respect to the expression or activity to be 
modulated. Such appropriate doses may be determined using the assays described 
herein. When one or more of these small molecules is to be administered to an animal 
(e.g., a human) in order to modulate expression or activity of a polypeptide or nucleic 
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acid of the invention, a physician, veterinarian, or researcher may, for example, 
prescribe a relatively low dose at first, subsequently increasing the dose until an 
appropriate response is obtained. In addition, it is understood that the specific dose 
level for any particular animal subject will depend upon a variety of factors including 
5 the activity of the specific compound employed, the age, body weight, general health, 
gender, and diet of the subject, the time of administration, the route of administration, 
the rate of excretion, any drug combination, and the degree of expression or activity to 
be modulated. 

The pharmaceutical compositions can be included in a container, pack, or 
10 dispenser together with instructions for administration. 



Other Embodiments 

In another aspect, the invention features, a method of analyzing a plurality of 
capture probes. The method can be used, e.g., to analyze gene expression. The 
15 Method includes: providing a two dimensional array having a plurality of addresses, 
each address of the plurality being positionally distinguishable from each other 
address of the plurality, and each address of the plurality having a unique capture 
probe, e.g., a nucleic acid or peptide sequence; contacting the array with a 22438, 
23553, 25278, or 26212 nucleic acid, preferably purified, polypeptide, preferably 
purified, or antibody, and thereby evaluating the plurality of capture probes. Binding, 
e.g.,inthe case of anucleic acid, hybridization with a capture probe at an address of 
the plurality, is detected, e.g., by signal generated from a label attached to the 22438, 
23553, 25278, or 26212 nucleic acid, polypeptide, or antibody. 

The capture probes can be a set of nucleic acids from a selected sample, e.g., a 
sample of nucleic acids derived from a control or non-stimulated tissue or cell. 

The method can include contacting the 22438, 23553, 25278, or 26212 nucleic 
acid, polypeptide, or antibody with a first array having a plurality of capture probes 
and a second array having a different plurality of capture probes. The results of each 
hybridization can be compared, e.g., to analyze differences in expression between a 
first and second sample. The first plurality of capture probes can be from a control 
sample, e.g., a wild type, normal, or non-diseased, non-stimulated, sample, e.g., a 
biological fluid, tissue, or cell sample. The second plurality of capture probes can be 
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from an experimental sample, e.g., a mutant type, at risk, disease-state or disorder- 
state, or stimulated, sample, e.g„ a biological fluid, tissue, or cell sample. 

The plurality of capture probes can be a plurality of nucleic acid probes each 
of which specifically hybridizes with an allele of 22438, 23553, 25278, or 26212. 
> Such methods can be used to diagnose a subject, e.g., to evaluate risk for a disease or 
disorder, to evaluate suitability of a selected treatment for a subject, to evaluate 
whether a subject has a disease or disorder. 22438, 23553, 25278, or 26212 are 
associated with sulfatase activity, thus it is useful for disorders associated with 
abnormal sulfatase activity. 

The method can be used to detect SNPs, as described below. 
In another aspect, the invention features, a method of analyzing a plurality of 
probes. The method is useful, e.g., for analyzing gene expression. The method 
includes: providing a two dimensional array having a plurality of addresses, each 
address of the plurality being positionaUy distinguishable from each other address of 
tfce plurality having a unique capture probe, e.g., wherein the capture probes are from 
a cell or subject which express or misexpress 22438, 23553, 25278, or 26212, or from 
a cell or subject in which a 22438, 23553, 25278, or 26212 mediated response has 
been elicited, e.g., by contact of the cell with 22438, 23553, 25278, or 26212 nucleic 
acid or protein, or administration to the cell or subject 22438, 23553, 25278, or 26212 
nucleic acid or protein; contacting the array with one or more inquiry probe, wherein 
an inquiry probe can be a nucleic acid, polypeptide, or antibody (which is preferably 
other than 22438, 23553, 25278, or 26212 nucleic acid, polypeptide, or antibody); 
providing a two dimensional array having a plurality of addresses, each address of the 
plurality being positionally distinguishable from each other address of the plurality, 
and each address of the plurality having a unique capture probe, e.g., wherein the 
capture probes are from a cell or subject which does not express 22438, 23553, 
25278, or 26212 (or does not express as highly as in the case of the 22438, 23553, 
25278, or 26212 positive plurality of capture probes) or from a cell or subject which 
in which a 22438, 23553, 25278, or 26212 mediated response has not been elicited (or 
has been elicited to a lesser extent than in the first sample); contacting the array with 
one or more inquiry probes (which is preferably other than a 22438, 23553, 25278, or 
26212 nucleic acid, polypeptide, or antibody), and thereby evaluating the plurality of 



W ° 01/554,1 PCT/US01/03266 

capture probes. Binding, e.g., in the case of a nucleic acid, hybridization with a 
capture probe at an address of the plurality, is detected, e.g., by signal generated from 
a label attached to the nucleic acid, polypeptide, or antibody. 

In another aspect, the invention features a method of analyzing 22438, 23553, 
5 25278, or 26212, e.g., analyzing stracture, function, or relatedness to other nucleic 
acid or amino acid sequences. The method includes: providing a 22438, 23553, 
25278, or 26212 nucleic acid or amino acid sequence; comparing the 22438, 23553, 
25278, or 26212 sequence with one or more preferably a plurality of sequences from a 
collection of sequences, e.g., a nucleic acid or protein sequence database; to thereby 
10 analyze 22438, 23553, 25278, or 26212. 

Preferred databases include GenBank™, The method can include evaluating 
the sequence identity between a 22438, 23553, 25278, or 26212 sequence and a 
database sequence. The method can be performed by accessing the database at a 
second site, e.g., over the internet. 
5 In another aspect, the invention features, a set of oligonucleotides, useful, e.g., 

for identifying SNFs, or identifying specific alleles of 22438, 23553, 25278, or 
26212. The set includes a plurality of oligonucleotides, each of which has a different 
nucleotide at an interrogation position, e.g., an SNP or the site of a mutation. In a 
preferred embodiment, the oligonucleotides of the plurality identical in sequence with 
0 one another (except for differences in length). The oligonucleotides can be provided 
with different labels, such that an oligonucleotides which hybridizes to one allele 
provides a signal that is distinguishable from an oligonucleotides which hybridizes to 
a second allele. 

This invention is further illustrated by the following examples which should 
> not be construed as limiting. The contents of all references, patents and published 
patent applications cited throughout this application are incorporated herein by 
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EXAMPLES 

5 Example 1: Identification and Characterization of Human 22438 cDNAs 

The human 22438 sequence (Figure 1A-B; SEQ ID NO:2), which is 
approximately 2175 nucleotides long including untranslated regions, contains a 
predicted methionine-initiated coding sequence of about 1578 nucleotides 
(nucleotides 248-1825 of SEQ ID NO:2; SEQ ID NG:11). The coding sequence 
10 encodes a 525 amino acid protein (SEQ ID NO:l). 

PFAM analysis indicates that 22438 contains a sulfatase domain. For general 
information regarding PFAM identifiers, PS prefix and PF prefix domain 
identification numbers, refer to Sonnhammer et al (1997) Protein 28:405-420 and 
http//ww.psc.edu/generaysoft^ 
15 As used herein, the term "sulfatase domain" includes an amino acid sequence 

of about 80-420 amino acid residues in length and having a bit score for the alignment 
of tiie sequence to the sulfatase domain (HMM) of at least 8. Preferably, a sulfatase 
domain includes at least about 1 00-250 amino acids, more preferably about 130-200 
amino acid residues, or about 160-200 amino acids and has a bit score for the 
20 alignment of the sequence to the sulfatase domain (HMM) of at least 16 or greater. 
The sulfatase domain (HMM) has been assigned the PFAM Accession PF00884 
(http;//pfam.wustl.eduA. An alignment of the sulfatase domain (amino acids 36^462 
of SEQ ID NO:l) of human 22438 with a consensus amino acid sequence derived 
from a hidden Markov model is depicted in Figure 19. 
25 In a preferred embodiment 22438-like polypeptide or protein has a "sulfatase 

domain" or a region which includes at least about 1 00-250, more preferably about 
130-200 or 160-200, amino acid residues and has at least about 60%, 70%, 80%, 90%, 
95%, 99%, or 100% sequence identity with a "sulfatase domain," e.g., the sulfatase 
domain of human 22438-like polypeptide or protein (e.g., annuo acid residues 36-462 
30 ofSEQIDNO:l). 

To identify the presence of an "sulfatase" domain in a 22438-like protein 
sequence, and make the determination that a polypeptide or protein of interest has a 
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particular profile, the amino acid sequence of the protein can be searched against a 
database of HMMs (e.g., the P&m database, release 2.1) using the default parameters 
{http://www.sanger.ac.uk/Software/Pfam/HMM. search). For example, the hmmsf 
program, which is available as part of the HMMER package of search programs, is a 
5 family specific default program for MILPAT0063 and a score of 15 is the default 
threshold score for deteimining a hit Alternatively, the threshold score for 
determining a hit can be lowered (e.g., to 8 bits). A description of the Pfara database 
can be found in Sonhammer et al. (1997) Proteins 25^:405-420 and a detailed 
description of HMMs can be found, for example, in Gribskov et al. (1990) Metk 
10 Enzymol 183: 146-1 59; Gribskov et al. (1987) Proc. Natl Acad. Sci. USA 54:4355- 
4358; Krogh et al. (1994) J. Mot. Biol 255:1501-1531; arid Stultz et al. (1993) 
Protein Sci. 2:305-3 14, the contents of which are incorporated herein by reference. 

Example 2: Tissue Distribution of 22348 mRNA 

15 Northern blot hybridizations with various RNA samples are performed under 

standard conditions and washed under stringent conditions, i.e., 0J2 X SSC at 65°C. 
A DNA probe corresponding to all or a portion of the 22348 cDNA (SEQ iD NO:2) 
can be used. The DNA is radioactively labeled with 32p.dCTP using the Prime-It Kit 
(Stratagene, La Jolla, CA) according to the instructions of the supplier. Filters 

20 containing mRNA from mouse hematopoietic and endocrine tissues, and cancer cell 
lines (Clontech, Palo Alto, CA) are probed in ExpressHyb hybridization solution 
(Clontech) and washed at high stringency according to manufacturer's 
recommendations. 



25 Example 3: Identification and Characterization of Human 23553 cDNAs 

The human 23553 sequence (Figure 5A-B; SEQ ID NO:4), which is 
approximately 4321 nucleotides long including untranslated regions, contains a 
predicted methionine-initiated coding sequence of about 2616 nucleotides 
(nucleotides 510-3125 of SEQ ID NO:4; SEQ ID NO:12). The coding sequence 
30 encodes a 871 amino acid protein (SEQ ID NG:3). 

PFAM analysis indicates mat 23553 has a sulfatase domain. For general 
information regarding PFAM identifiers, PS prefix and PF prefix domain 
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identification numbers, refer to Sonnhammer et al. (1 997) Protein 28:405-420 and 
http//vmw.psc.edu/g eneral/sofrv t ^are/packages/pfam/pfam.html . An alignment of the 
sulfatase domain (amino acids 43 to 467 of SEQ ID NO:3) of human 23553-iike with 
a consensus amino acid sequence derived ifrom a hidden Markov model is depicted in 
5 Figure 20. For further information on sulfatase domains, see Example 1. 

In one embodiment, a 23553-Hke protein includes at least one transmembrane 
domain. As used herein, the term "transmembrane domain" includes an amino acid 
sequence of about 15 amino acid residues in length that spans a phospholipid 
membrane. More preferably, a transmembrane domain includes about at least 18, 20, 

10 22, or 24 amino acid residues and spans a phospholipid membrane. Transmembrane 
domains are rich in hydrophobic residues, and typically have an a-helical structure. 
Ih apreferred embodiment, at least 50%, 60%, 70%, 80%, 90%, 95% or more of the 
amino acids of a transmembrane domain are hydrophobic, e.g., leucines, isoleucines, 
tyrosines, or tryptophans. Transmembrane domains are described in, for example, 

15 kto://pfam.wusU.ed^ and Zagotta W.N. et al. (1996) 

Annual Rev. Neuronsci. iP:235-63, the contents of Which are incorporated herein by 
reference. 

In a preferred embodiment, a 23553-like polypeptide or protein has at least 
one transmembrane domain or a region which includes at least 18, 20, 22, or 24 amino 

20 acid residues and has at least about 60%, 70% 80% 90% 95%, 99%, or 100% 

sequence identity with a "transmembrane domain," e.g., at least one transmembrane 
domain of human 23553 (e.g., amino acid residues 7 to 25 of SEQ ID NO:3). 

In another embodiment, a 23553 protein includes at least one "non- 
transmembrane domain." As used herein, "non-transmembrane domains" are 

25 domains that reside outside of the membrane. When referring to plasma membranes, 
non-transmembrane domains include extracellular domains (i.e., outside of the cell) 
and intracellular domains (i.e., within the cell). When referring to membrane-bound 
proteins found in intracellular organelles (e.g., mitochondria, endoplasmic reticulum, 
peroxisomes and microsomes), non-transmembrane domains include those domains of 

30 the protein that reside in the cytosol (i.e., the cytoplasm), the lumen of the organelle, 
or the matrix or the intermembrane space (the latter two relate specifically to 
mitochondria organelles). The C-terminal amino acid residue of a non-transmembrane 
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domain is adjacent to anN-terrrM* 
in a naturally occurring 23553-like protein. 

In a preferred embodiment, a23553-Iike polypeptide or protein has a "non- 
transmembrane domain" or aregion which includes at least about 1-350, preferably 
about 200-320, more preferably about 230-300, and even more preferably about 240- 
280 amino acid residues, and has at least about 60%, 70% 80% 90% 95%, 99% or 
100% sequence identity with a Wtransmembrane domain", e.g., a non- 



A non-rransmembrane domain located at the N-terminus of a 23553-like 
) protein or polypeptide is referred to herein as an "N-terminal non-transmembrane 
domain." As used herein, an "N-terminal non-transmembrane domain" includes an 
amino acid sequence Wing about 1-100. For example, an N-terminal non- 
transmembrane domain is located at about amino acid residues 1 to 6 of SEQID 
NO:3. 

Similarly, a non-transmembrane domain located at the C-tenninus of a 23553* 
like protein or polypeptide is referred to herein as a "C-terminal non-transmembrane 
domain .» As used herein, a"C-tenrunal non-transmembrane domain" includes an 
amino acid sequence having about 1-800, preferably about 15-500, preferably about 
20-270, more preferably about 25-255 amino acid residues in length and is located 
outside the boundaries of a membrane. For example, a C-terminal non- 
transmembrane domain is located at about amino acid residues 26-871 of SEQ ID 
NO:3. 

The ORF analyzer predicts that 23553 has a signal peptide. Therefore, a 
23553-like molecule can further include a signal sequence. As used herein, a "signal 
sequence" refers to a peptide of about 20-80 amino acid residues in length which 
occurs at the N-terminus of secretory and integral membrane proteins and which 
contains a majority of hydrophobic amino acid residues. For example, a signal 
sequence contains at least about 12-25 amino acid residues, preferably about 30-70 
amino acid residues, and has at least about 40-70%, preferably about 50-65%, and 
more preferably about 55-60% hydrophobic amino acid residues (e.g., alanine, valine, 
Ieucme,isoleucme,phenyIalaiun e , tyrosine, tryptophan, or proline). Sucha"signal ' 
sequence", also referred to in the art as a "signal peptide", serves to direct a protein 
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containing such a sequence to a lipid bilayer. For example, in one embodiment, a 
23553-like protein contains a signal sequence of about amino acids 1-22 of SEQ ID 
NO:3. The "signal sequence" is cleaved during processing of the mature protein. The 
mature 23553-like protein corresponds to amino acids 23-871 of SEQ ID NO:3. 
5 CLUSTAL multiple sequence alignment analysis shows homology between 

23553 and the following sequences (identified by GenBank accession number): 
P14217, Chlamydomonas reinhardtii arylsulfatase; Q10723, Volvox carteri 
aryisulfatase; CAB40661, human N-acetylglucosamine-6-sulfatase homolog; P15586, 
human N-acetylglucosamine-6-sulfatase; P50426, goat N-acetylglucosamine-6- 
1 0 sulfatase; AAA8361 8, C. elegcms putative sulfatase; AAC0271 6, Neurospora crassa 
arylsulfatase; P31447, K coli hypothetical sulfatase. 

Examples Tissue Distribution of 23553 mRNA 

In normal human tissues tested, high expression of 23553 was observed in 

15 trachea, vein, osteoblast, kidney, and testes. Significant expression of 23553 was 
found in adipose, colon, skeletal muscle, thyroid, prostate, and other tissues. $m 
Figure 25. In comparisons of normal and tumor tissue, 23553 expression was detected 
in all samples tested, with increased expression in breast, colon, and lung tumors. See 
Figure 26. Further, elevated expression of 23553 was found in glioblastoma samples^ 

20 as compared to normal brain tissue samples. Expression levels were determined by 
quantitative PCR (Taqman® brand quantitative PCR kit, Applied Biosystems). The 
quantitative PCR reactions were performed according to the kit manufacturer's 
instructions. 

cDNA library array analysis of 23553 revealed expression in adipose, adrenal 
25 gland, bone, brain, colon, colon metastases to liver, endothelial, heart, liver, lung, 
muscle, osteoblast, skin, testes, thyroid, and other tissue. Reverse transcriptase 
polymerase chain reaction (RT-PCR) revealed 23553 expression in clinical samples of 
normal and tumor colon tissue, normal and metastatic liver tissue, and in lung 
squamous cell carcinoma tissue. In situ hybridization showed expression of 23553 in 
30 the following tissues: 3 of 3 breast tumor, 0 of 2 normal breast; 4 of 4 lung tumor; 0 
of 2 normal lung; 4 of 4 colon tumor; and 2 of 2 liver metasteses. In all cases, 
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expression of 23553 was confined tp the stomal component of tissue; no expression 
was detected in normal or tumor epithelium. 

Angiogenic growth factors (e.g., bFGJF) are present in the extracellular matrix 
(ECM), and can be released from the ECM by heparinase-like enzymes. This 
5 includes the glycosyl-sulfatases. Hie released growth factors in turn stimulate blood 
vessel formation. See Baird A, Ling N., "Fibroblast growth factors are present in the 
extracellular matrix produced by endothelial cells in vitro: implications for a role of 
heparinase-like enzymes in the neovascular response," Biochem Biophys Res 
Comrmm. (1987) 142(2):428-35. 

10 As noted, 23553 has amino acid sequence features that place it in the class of 

glycosyl sulfate cleaving enzymes. Taqman results (above) show that its expression 
is elevated in clinical tumor samples. In situ hybridization shows specific, localized 
23553 expression in the tumor stromal component of all tumor samples tested, 
whereas its expression is low or absent in normal tissues. This suggests that, through 

1 5 catalytic activity, 23553 promotes tumor growth or is involved in tumor maintenanee 
by degrading the ECM and releasing growth factors. 

Example 5: Identification and Characterization of Human 25278 cDNAs 

The human 25278 sequence (Figure 10A-B; SEQ ID NO:6), which is 

20 approximately 2940 nucleotides long including untranslated regions, contains a 
predicted methionine-initiated coding sequence of about 1710 nucleotides 
(nucleotides 334-2043 of SEQ ID NO:6; SEQ ID NO:13). The coding sequence 
encodes a 569 amino acid protein (SEQ ID NO:5). 

PFAM analysis indicates that 25278 has a sulfatase domain. For general 

25 information regarding PFAM identifiers, PS prefix and PF prefix domain 

identification numbers, refer to Soimhammer et at (1997) Protein 28:405-420 and 
http//wv\w.psc.edu/genera^^ An alignment of the 

sulfatase domain (arnino acids 47 to 471 of SEQ ID NO:5) of human 25278 with a 
consensus amino acid sequence derived from a hidden Markov model is depicted in 

30 Figure 27. For further information on sulfatase domains, see Example 1 . 
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Example 6: Identification and Characterization of Human 26212 cDNAs 

The human 26212 sequence (Figure 15; SEQ ID NO:8), which is 
approximately 2253 nucleotides long including untranslated regions, contains a 
5 predicted metMonine-initiated coding sequence of about 1800 nucleotides 
(nucleotides 324-2123 of SEQ ID NO:8; SEQ ID NO:14). The coding sequence 
encodes a 599 amino acid protein (SEQ ID NG:7). 

PFAM analysis indicates that 26212 has a sulfatase domain. For general 
information regarding PFAM identifiers, PS prefix and PF prefix domain 
10 identification numbers, refer to Sorahammer et al (1997) Protein 28:405-420 and 
http//ww.ps c.edu/general/sof^ An alignment of the 

sulfatase domain (amino acids 76-502 of SEQ ID NO:7) of human 26212 with a 
consensus amino acid sequence derived from a hidden Markov model is depicted in 
Figure 29. For further information on sulfatase domains, see Example 1 . 
15 In one embodiment, 26212-like protein includes at least one. transmembrane 

domain. As used herein, the term "transmembrane domain" includes an amino acid 
sequence of about 1 5 amino acid residues in length that spans a phospholipid 
membrane. More preferably, a transmembrane domain includes about at least 18, 20, 
22, or 24 amino acid residues and spans a phospholipid membrane. For more 
20 information on transmembrane domains, see example 3 . 

In a preferred embodiment, a 26212-like polypeptide or protein has at least 
one transmembrane domain or a region which includes at least 1 8, 20, 22, 24, 25, or 
30 amino acid residues and has at least about 60%, 70% 80% 90% 95%, 99%, or 
100% sequence identity with a "transmembrane domain," e.g., at least one 
25 transmembrane domain of human 26212-like polypeptide or protein (e.g., amino acid 
residues 24 to 44 of SEQ ID NO :7). 

In another embodiment, a 26212-like protein includes at least one "non- 
transmembrane domain." The C-terminal amino acid residue of anon-transmembrane 
domain is adjacent to an N-terrninal amino acid residue of a transmembrane domain 
30 in a naturally occurring 26212-like protein. For more information on non- 
transmembrane domains, see Example 3. 
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In a preferred embodiment^ a 26212-lifee polypeptide or protein has a "non- 
transmembrane domain" a region which includes at least about 1-350, preferably 
about 200-320, more preferably about 230-300, and even more preferably about 240- 
280 amino acid reddues,#ja^,.^l^-a!b<aA60%, 70% 80% 90% 95%, 99% or 
5 1 00% sequence identity with a "non-transmembrane domain", e.g., a non- 

transmembrane domain of human 2621 2-like polypeptide ox protein. An N-terminal 
non-transmembrane domain is located at about amino acid residues 1 to 23 of SEQ ID 
NO:7. A C-terminal non-transmembrane domain is located at about amino acid 
residues 45 to 599 of SEQ ID NO: 7. 
10 A 262 1 2-like molecule can further include a signal sequence. For more 

information on signal sequences, see Example 3. 

Example 7: Tissue Distribution of 26212 mRNA 

In six independent experiments, 26212 showed higher levels of expression in 

15 proliferating endothelial cells as compared to arrested endothelial cells. 26212 
expression was also higher in proliferating endothelial cells than in non-endothelial 
cells. See Figure 30, 26212 expression levels were upregulated in breast tissue cell 
lines treated with epidermal growth factor, as well. See Figure 34. 26212 is expressed 
in hemangiomas and other angiogenic tissues, including fetal heart, uterine 

20 adenocarcinoma, and endometrial polyps. See Figure 35. Endothelial and glial cells 
showed higher levels of 26212 expression as compared to other tissues and cells. See 
Figure 36. 26212 also showed higher levels of expressing in some lung, breast and 
brain tumors as compared to normal tissues. Expression levels of 26212 were found 
to be higher in proliferating endothelial cells than in tumors, too. Expression levels 

25 were determined by quantitative PCR (Taqman® brand quantitative PCR kit, Applied 
Biosystems). The quantitative PCR reactions were performed awording to the kit 
manufacturer's instructions. 

In situ hybridization analysis was also carried out. 26212 showed weak 
expression in ovarian tumor, and no expression in normal ovary. Similarly, colon 

30 metastases showed weak expression of 26212, and normal colon tissue and primary 
tumors showed no expression. A subset of lung tumors tested showed expression of 
26212, while no expression was revealed in normal lung. 
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Angiogenic growth factors (e.g., bFGF) are present in the extracellular matrix 
(ECM), and can be released from the EGM by heparinase-like enzymes. This 
includes the glycosyl-sulfatases. The released growth factors in turn stimulate blood 
vessel formation by, e.g., attracting endothelial cells to form new vessels. See Baird 
5 A, Ling K, "Fibroblast growth factors are present in the extracellular matrix produced 
by endothelial cells in vitro: implications for a role of heparinase-like enzymes in the 
neovascular response," Biochem Biophys Res Commun. (1987) 142(2):428-35. 

As noted, 26212 has amino acid sequence features that place it in the class of 
glycosyl sulfate cleaving enzymes. Taqman results (above) show that its expression 
10 is elevated in proliferating endothelial cells, suggesting that 26212 is specifically 
involved in active angiogenic sites. 



Example 8: Recombinant Expression of 22348, 23553, 25278, or 26212 in Bacterial 
Cells 

15 In this example, 22348, 23553, 25278, or 26212 is expressed as a recombinant 

glutathione-S-transferase (GST) fusion polypeptide in E. coli and the fusion 
polypeptide is isolated and characterized. Specifically, 22348, 23553, 25278, or 
26212 is fused to GST and this fusion polypeptide is expressed in E. coli, ^e.g., strain 
PEB199. Expression of the GST-26212 fusion protein in PEB1 99 is induced with 

20 IPTG. The recombinant fusion polypeptide is purified from crude bacterial lysates of 
the induced PEB199 strain by affinity chromatography on glutathione beads. Using 
polyacrylamide gel electrophoretic analysis of the polypeptide purified from the 
bacterial lysates, the molecular weight of the resultant fusion polypeptide is 
determined. 

25 

Example 9: Expression of Recombinant 22348, 23553, 25278, or 26212 Protein in 
COS Cells 

To express the 22348, 23553, 25278, or 26212 gene in COS cells, the 
pcDNA/Amp vector by Invitrogen Corporation (San Diego, CA) is used. This vector 
30 contains an SV40 origin of replication, an ampicillin resistance gene, an E. coli 
replication origin, a CMV promoter followed by a polylinker region, and an SV40 
intron and polyadenylation site. A DNA fragment encoding the entire 22348, 23553, 
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25278, or 26212 protein and an HA tag (Wilson et al (1 984) Cell 37:767) or a FLAG 
tag fused in-frame to its 3' end of the fragment is cloned into the polylinker region of 
the vector, thereby placing the expression of the recombinant protein under the 
control of the CMV promoter. 

To construct the plasmid, me 22348, 23553, 25278, or 26212 DNA sequence 
is amplified by PGR using two primers. The 5' primer contains the restriction site of 
interest followed by approximately twenty nucleotides of the 22348, 23553, 25278, or 
26212 coding sequence starting from the initiation codon; the 3* end sequence 
contains complementary sequences to the other restriction site of interest, a translation 
10 stop codon, the HA tag or FLAG tag and the last 20 nucleotides of the 22348, 23553, 
25278, or 26212 coding sequence. The PCR amplified fragment and the 
pCDNA/Amp vector are digested with the appropriate restriction enzymes and the 
vector is dephosphorylated using the CIAP enzyme (New England Biolabs, Beverly, 
MA). Preferably the two restriction sites chosen are different so that the 22348, 
15 23553, 25278, or 26212 gene is inserted in the correct orientation. The ligation 
mixture is transformed into R coli cells (strains HB101, DH5ct, SURE, available 
from Stratagene Cloning Systems, La Jolla, CA, can be used), the transformed culture 
is plated on ampicillin media plates, and resistant colonies are selected, Plasmid 
DNA is isolated from transformants and examined by restriction analysis % the 
20 presence of the correct fragment. 

COS cells are subsequently transfected with the 22348, 23553, 25278, or 
26212-pcDNA/Amp plasmid DNA using 1he calcium phosphate or calcium chloride 
co-precipitation methods, DEAE-dextran-mediated transfection, lipofection, or 
electroporation. Other suitable methods for transfecting host ceils can be found in 
25 Sambrook, J., Fritsh, % F, and Maniatis, T. Molecular Cloning: A Laboratory 
Manual 2nd, ed, Cold Spring Harbor Laboratory, Cold Spring Harbor Laboratory 
Press, Cold Spring Harbor, NY, 1989. The expression of the 22348, 23553, 25278, or 
26212 polypeptide is detected by radiolabelling (^-methionine or ^-cysteine 
available from NEN, Boston, MA, can be used) and immunoprecipitation (Harlow, E. 
30 and Lane, D. Antibodies: A Laboratory Manual, Cold Spring Harbor Laboratory 
Press, Cold Spring Harbor, NY, 1988) using an HA specific monoclonal antibody. 
Briefly, the cells are labeled for 8 hours with 35 S-methionine (or 35 S-cysteine). The 
-102- 
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culture media are then collected and the cells are lysed using detergents (RIPA buffer, 
150 mM NaCl, 1% MP-40, 0.1% SDS, 0.5% DOC, 50 mM Tris, pH 7.5). Both fee 
cell lysate and the culture media are precipitated with an HA specific monoclonal 
antibody. Precipitated polypeptides are then analyzed by SDS-PAGE. 
5 Alternatively, DNA containing the 22348, 23553, 25278, or 26212 coding 

sequence is cloned directly into the polylinker of the pCDNA/Amp vector using the 
appropriate restriction sites. The resulting plasmid is transfected into COS cells in the 
manner described above, and the expression of the 22348, 23553, 25278, or 26212 
polypeptide is detected by radiolabelling and immunoprecipitation using a 22348, 

10 23553, 25278, or 26212 specific monoclonal antibody. 

This invention may be embodied in many different forms and should not be 
construed as limited to the embodiments set forth herein; rather, these embodiments are 
provided so that this disclosure will fully convey the invention to those skilled in the art 
Many modifications and other embodiments of the invention will come to mind in one 

15 skilled in the art to which this invention pertains having the benefit of the teachings 
presented in the foregoing description. Although specific terms are employed, they are 
used as in the art unless otherwise indicated. 
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THAT WHICH IS CLAIMED: 
1 . An isolated nucleic acid molecule selected ftom the group consisting 

of: 

a) a nucleic acid molecule comprising a nucleotide sequence 
5 which is at least 60% identical to the nucleotide sequence ofSEQ ID NO: 2, 4, 6, 8, 
1 1 , 1 2, 13, or 14, or the nucleotide sequence of the cDNA insert of the plasmid 

deposited with ATCC as Patent Deposit Number , PTA-1639, PTA-1S46, or 

, wherein said nucleotide sequence encodes a polypeptide having biological 

activity; 

10 b) a nucleic acid molecule comprising a fragment of at least 20 
nucleotides of the nucleotide sequence ofSEQ ID NO: 2, 4, 6, 8, 1 1, 12, 13, or 14, or 
the nucleotide sequence of the cDNA insert of the plasmid deposited with ATCC as 
Patent Deposit Number , PTA-1 639, PTA-1 846, or ; 

c) a nucleic acid molecule which encodes a polypeptide 

15 comprising the amino acid sequence of SEQ ID NO: 1, 3, 5, or 7, or me amino acid 
sequence encoded by the cDNA insert of the plasmid deposited with the ATCC as 
Patent Deposit Number , PTA-1639, PTA-1846, or ; 

d) a nucleic acid molecule which encodes a fragment of a 
polypeptide comprising the amino acid sequence of SEQ ID NO: 1, 3, 5, 7, or the 

20 amino acid sequence encoded by the cDNA insert of the plasmid deposited with the 

ATCC as Patent Deposit Number , PTA-1639, PTA-1846, or , wherein the 

fragment comprises at least 15 contiguous amino acids of SEQ ID NO:l, 3, 5, or 7, or 
the amino acid sequence encoded by the cDNA insert of the plasmid deposited with 
the ATCC as Patent Deposit Number , PTA-1639, PTA-1846, or ; 

25 e) a nucleic acid molecule which encodes a naturally occurring 
allelic variant of a biologically active polypeptide comprising the amino acid 
sequence of SEQ ID NO:l, 3, 5, or 7, or the amino acid sequence encoded by the 
cDNA insert of the plasmid deposited with the ATCC as Patent Deposit Number 
, PTA-1639, PTA-1846, or , wherein the nucleic acid molecule hybridizes 

30 to a nucleic acid molecule comprising the complement of SEQ ID NO:2, 4, 6, 8, 1 1, 
12, 13, or 14 under stringent conditions; and 
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f) a nucleic acid molecule comprising the complement of a), b), 

c),d),ore). 

2. The isolated nucleic acid molecule of claim 1, which is selected from 

5 



a) a nucleic acid molecule comprising die nucleotide sequence of 
SEQ ID NO:2, 4, 6, 8, 1 1, 12, 13, 14, the cDNA insert of any one the plasmids 

deposited with ATCCas Patent Deposit Number , PTA-1639, PTA-1846, or 

, or a complement thereof; and 
10 b ) a nucleic acid molecule which encodes a polypeptide 

comprising the amino acid sequence of SEQ ID NO:l, 3, 5, or 7, or an amino acid 
sequence encoded by the cDNA insert of any of the plasmids deposited with ATCC as 
Patent Deposit Number , PTA-1639, PTA-1846, or . 

15 3 - nucleic acid molecule of claim 1 further comprising vector nucleic 



4. The nucleic acid molecule of claim 1 further comprising nucleic acid 
sequences encoding a heterologous polypeptide. 

20 

5. A host cell which contains the nucleic acid molecule of claim 1. 

6. The host cell of claim 5 which is a mammalian host cell. 

25 7 - A nonhuman mammalian host cell containing the nucleic acid 

molecule of claim 1. 

8. An isolated polypeptide selected from the group consisting of: 

a) a biological active polypeptide which is encoded by a nucleic 
30 acid molecule comprising a nucleotide sequence which is at least 60% identical to a 
nucleic acid comprising the nucleotide sequence ofSEQ ID NO: 2, 4, 6, 8, 11, 12, 13, 



- 109 - 



WO 01/55411 



PCT/US01/03266 



or 14 or the nucleotide sequence of the cDNA insert of the plasmid deposited with 
ATCC as Patent Deposit Number. ;FIA«4639, PTA-1846, or ; 

b) a naturally occurring allelic variant of a polypeptide comprising 
the amino acid sequence of SEQ ID NO:l, 3, 5, or 7, or the amino acid sequence 

5 encoded by the cDNA insert of the plasmid deposited with the ATCC as Patent 

Deposit Number , PTA-1639, PTA-1846, or , wherein the polypeptide is 

encoded by a nucleic acid molecule which hybridizes to a nucleic acid molecule 
comprising the complement of SEQ ID NO: 2, 4, 6, 8, 1 1, 12, 13, or 14 under 
stringent conditions; and, 

c) a fragment of a polypeptide comprising the amino acid 
sequence of SEQ ID NO:l, 3,5, or 7, or the amino acid sequence encoded hy the 
cDNA insert of the plasmid deposited with the ATCC as Patent Deposit Number 

, PTA-1639, PTA-1846, or , wherein the fragment comprises at least 15 

contiguous arnino acids of SEQ ID NO:l, 3, 5, or 7; and 

d) a polypeptide having at least 60% sequence identity to the 
amino acid sequence SEQ ID NO:l, 3, 5, or 7, wherein the polypeptide has biological 
activity. 

9. The isolated polypeptide of claim 8 comprising the ammo acid 
sequence of SEQ ID NO:l, 3, 5, or 7, or an amino acid sequence encoded by the 
cDNA insert of any of the plasmids deposited with ATCC as Patent Deposit Number 
, PTA-1639, PTA-1846, or . 

10. The polypeptide of claim 8 further comprising heterologous amino 
acid sequences. 

11. An antibody which selectively binds to a polypeptide of claim 8. 

12. A method for producing a polypeptide selected from the group 
consisting of: 

a) a polypeptide comprising the amino acid sequence of SEQ ID 
NO: 1, 3, 5, or 7, or the amino acid sequence encoded by the cDNA insert of the 
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piasmid deposited with the ATCC as Patent Deposit Number , PTA-1639, PTA- 

1846, or ; 

b) a polypeptide «Mfiprismg a fragment of the amino acid 
sequence of SEQ ID NO: 1, 3, 5, or 7, or the amino acid sequence encoded by the 
5 cDNA insert of the piasmid deposited with the ATCC as Patent Deposit Number 

, PTA-1639, PTA-1846, or_ , wherein the fragment comprises at least 15 

contiguous amino acids of SEQ IDNO:l, 3, 5, or 7, or the amino acid sequence 
encoded by the cDNA insert of the piasmid deposited with the ATCC as Patent 
Deposit Number , PTA-1639, PTA-1 846, or ; 

10 c ) a biologically active mturally occurring allelic variant of a 

polypeptide comprising the amino acid sequence of SEQ ID NO:l, 3, 5, or 7, or the 
amino acid sequence encoded by the cDNA insert of the piasmid deposited with the 

ATCC as Patent Deposit Number , PTA-1639, PTA-1846, or , wherein the 

polypeptide is encoded by a nucleic acid molecule which hybridizes to a nucleic acid 

15 molecule comprising the complement of SEQ ID NO;2, 4, 6, 8, 1 1, 12, 13, or 14- 
d) a polypeptide having at least 60% sequence identity to the 
amino acid sequence of SEQ ID NO:l, 3, 5, or 7, wherein said polypeptide has 
biological activity; 

comprising cuituring the host cell of claim 5 under conditions in which the nucleic 
20 acid molecule is expressed. 

13. The method of claim 12 wherein said polypeptide comprises the amino 
acid sequence of SEQ ID NO:l, 3, 5, or 7. 

25 14> Amethodfor detecting the presence ofa polypeptide of claim 8ina 

sample, comprising: 

a) contacting the sample with a compound which selectively binds 
to a polypeptide of claim 8; and 

b) determining whether the compound binds to the polypeptide in 

30 the sample. 
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1 5. The method of claim 14, wherein the compound which binds to the 
polypeptide is an antibody. 



16. A kit comprising a compound which selectively binds to a polypeptide 
5 of claim 8 and instructions for use. 



17. A method for detecting the presence of a nucleic acid molecule of 
claim 1 in a sample, comprising the steps of; 

a) contacting the sample with a nucleic acid probe or primer 
1 0 which selectively hybridizes to the nucleic acid molecule; and 

b) determining whether the nucleic acid probe or primer binds to a 
nucleic acid molecule in the sample. 

18. The method of claim 17, wherein the sample comprises mRNA 
1 5 molecules and is contacted with a nucleic acid probe. 

19. A kit comprising a compound which selectively hybridizes to a nucleic 
acid molecule of claim 1 and instructions for use. 

20 20. A method for identifying a compound which binds to a polypeptide of 

claim 8 comprising the steps of: 

a) contacting a polypeptide, or a cell expressing a polypeptide of 
claim 8 with a test compound; and 

b) determining whether the polypeptide binds to the test 

25 compound. 



21 . The method of claim 20, wherein the binding of the test compound to 
the polypeptide is detected by a method selected from the group consisting of: 

a) detection of binding by direct detecting of test 
30 compound/polypeptide binding; 

b) detection of binding using a competition binding assay; 

c) detection of binding using an assay for sulfatase activity. 
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22. A method for modulating the activity of a polypeptide of claim 8 
comprising contacting a polypeptide or a cell expressing a polypeptide of claim 8 with 
a compound which binds to the polypeptide in a sufficient concentration to modulate 

5 the activity of the polypeptide. 

23 . A method for identifying a compound which modulates the activity of 
a polypeptide of claim 8, comprising: 

a) contacting a polypeptide of claim 8 with a test compound; and 
1 0 b) determining the effect of the test compound on the activity of 

the polypeptide to thereby identify a compound which modulates the activity of the 
polypeptide. 

24. A method for identifying an agent that modulates the level of 

15 expression of a nucleic acid molecule of claim 1 in a cell, said method comprising 
contacting said agent with the cell expressing said nucleic acid molecule such that 
said level of expression of said nucleic acid molecule can be modulated in said cell by 
said agent and measuring said level of expression of said nucleic acid molecule. 

20 25. A method for modulating the level of expression of a nucleic acid 

molecule of claim 1, said method comprising contacting said nucleic acid molecule 
with an agent under conditions that allow the agent to modulate the level of 
expression of the nucleic acid molecule. 

25 26. A pharmaceutical composition containing any of the polypeptides in 

claim 8 in apharmaceuticaily acceptable carrier. 
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<»CGCGTCttXy^T^^ 

ATAGCXXm'ATCAlCTGXX^TCACCAC^ 

TTTGAAAGTGAGCAGAAAGGAAGCT<neAGAAA^ 

MG'WI.FI,KVLI»AGVSf"sg' 17 

CTTCACCBCC ATG GGC TGG CTT TTT CTA AM5 GTT TTG TTG GOG GGA GTG AST TTC TCA GGA SI 

FI. YEI.VDFCISGKTRGQ KPN 37 

TTT CTT TAT OCT CTT GTG CAT TTT TCC ATC AGT G3G AAA ATA ASA GGA CAG AM CO". AAC 111 

FVIII.ADDHGW&»t.GANW AB 57 

TIT GTG ATT ATT TTG GCC GAT GAC ATG GGG TGG GGT GAC CTG GGA GCA AAC TGG GCA GAA 171 

TKDTAMLDKHLASEGMRFVD F 77 

SCA AAG GAC ACT GCC AAC CTT GAT AAG ATG GCT ICG GAG GGA ATG AGS TTT GTG GAT TTC 231 

■HAAASTCSPSRiStlr*. OR-.-tO 97 

CAT GCA GCT GCC TCC ACC TGC TCA CCC TCC CGG GCT TCC TTG CTC ACC GGC COG CTT GGC 261 

L R N G V T R N F A V T S V G G t F I> U 117 

CTT CGC AAT GGA GTC ACA CGC AAC TTT GCA GTC ACT TCT GTG GGA GGC CTT COG CTG AAC; 351 

E T T JC A E V L Q Q A G V V T ,G X I G X 137 

GAG ACC ACC TTG GCA GAG GTG CTG CAG CAG GOG GGT TAC GTC ACT GGG ATA ATA GGC AAA 411 

M K S G H H G S Y H £ M F R G * D Y Y P 157 

TGG CAT CTT GGA CAC CAC GGC TCT TAT CAC CCC AAC TIC CGT GGT TTT GAT TAC TAC TTT 471 

GXPYSHDMGCTDTPGY H H P P 177 

GGA ATC CCA TAT AGO CAT GAT ATS GGC TGT ACT GAT ACT CCA GGC TAC AAC CAC OCT CCT S31 

CPACPQGDGPSRNLQRDCYT 197 

TCT CCA GOG TGT CCA CAG GGT GAT GGA CCA TCA AGG AAC CTT CAA AGA GAC TGT TAC ACT 591 

»VAX.PI.YENI,MIVEQPVNt- S 217 

GAC GTG GCC CTC CCT CTT TAT GAA AAC CTC AAC ATT GTG GAG CAG COS GTG AAC TBS AGC 651 

S&AQKYABKATQFIQRA STS 237 

AGC CTT GCC CAG AAG TAT GCT GAG AAA GCA ACC CAG TTC ATC CAG CGT GCA AGC ACC AGC 711 

GRPFLI.YVAI.AHMHVPJ-PV T 257 

GGG AGG CCC TTC CTG CTC TAT GTG GCT CTG GCC CAC ATG CAC GTG CCC TEA CCC GTG ACT 771 

©;I«£>AAPRGRSl»YGAGIiWBHD 277 

CAG CTA CCA GCA GOG CCA CGC GGC AGA AGC CTG TAT GST GCA GGG CTC TGG. GAG ATG GAC 831 

S JjVGQ I KD' KVDHTW K E'HT F I» 297 

AGP CTG GTG GGC CAG ATC ARG GAC AAA GTT GAC CAC ACA GIG AAG GAA AAC ACA TTC CTC 891 

WFTGDKGPWAQKCEI.A GSVG 317 

TGG TTT ACA GGA GAC AAT GGC CCG TGG GCT CAG AAG TGT GAG CTA GOG GGC RGT GTG GGT 951 

PFTGFKQTRQGCSPAKQT TW 337. 

CCC TTC ACT GGA TTT TGG CAA ACT OST CAA GGG GGA ACT CCA GCC AAG CAG AOS ACC TGG 1011 

E G G K R V P A L. A Y « P G R V P V » v 357 
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GAA GGA CQ3 CAC CGG GXC CCA GCA CTG GCT TAC T3Q OCT CGC AGA GTT CCA OCT ART GTC 1071 

* S 9* A & £> S V t 0 I F J? « V V A '«■ * 0 377 

ACC JSGC ACT GCC TTG T» AGC GTG CTG CAC ATT TTT CCA ACT GTG GTA GCC CTG GCC CAG 1131 

AStiPQGRRFDGVDVS EVljFG 397 

GCC AGC TTA CCT CAA GGA CGG CGC TTT GAT GGT GTG GAC GXC TCC GAC GT3 CTC TTT GGC 1191 

RSQPGHRVI.FHP HS GAAGEF 417 

CGG TCA CAG OCT GGG CAC AGG GTG CTG TPC- CAC CCC ARC AGC GGG GCA GCT G3A SAG TIT 1251 

GAr-QTVRLERYKAFYITGGA 437 

GGA GCC CTG CAG ACT GTC CGC CTG GAG CGT TAC AAG GCC TTC TAG ATT ACC GGT GGA GCC 1311 

RACDGSTGPBLQHKFPLIFN 457 

AGS GCG T5T GAT GGG AGC ACQ CGG CCT GAG CTG CAG CAT AAG TTT CCT CTG ATT TIC AAC 1371 

LE DDTAEAVPLERGGAEYQ A 477 

CTG GAA GAC GAT ACC GCA GAA GCT GTG CCC CTA GAA AGA GGT GGT GCG GAG TAC CRfi GCT 1431 

y X. B S V 8 K V I. A » V I* <J. » I A H D R 497 

GTG CTG CCC GAG GTC AGA AAG GTT CTT GCA GAC GTC CTC CAA GAC ATT GCC AAC GAC AAC 1491 

ISSADYTQDPSVTPC C HPYQ 517 

ATC TCC AGC GCA GAT TAC ACT CAG GAC CCT TCA GTA ACT CCC TGC TGT AAT CCC TAG CAA 1551 

lACRCQAA * 
ATT GCC TGC CGC TGT CAA GCC GCA TAA 

CAGAQCAATTTTTATTCCACGAGGAGGAGTACCTGG 

ACAAACACACGCTTTAGTTOAGTCXTGGAGrc 

TCCftCGOOSACCCGAGAGCAGCTGAGCTGCGC^ 

G^K^aX^CAGGTGCCAQ^ICCAGCTrTM 
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Input 01e H5h23553£l.eei; Output File 23553. trans 
Sequence 'length 4321 




CAGGAACATGACTCTCCCCCTTC^^ 




HKYSCCALVLA 11 

CATTITGTaftGTCTTGCA^ ATG AAG TAT TCT TGC TGT GCT CT3 GOT TIG GCT 33 

VLG T£I,I«G8I,CSTVR£!PRFR 31 

GTC CTG GGC ACA GAA TO CTG GGA AGC CTC TCT TOG ACT GTC AGA TCC COG AQG TIC ASA. 93 

G R I Q 0 B R X H I: JR P N X i % V LTD 51 

GGA COG ATA CAG CAG GAA OSA AAA AAC ATC CGA CCC AAC ATT ATT CTT GOT CTT ACC GAT IB 3 

U 2. O V £ I* G S L Q V M Dt K T R K I H E 71 

GAT CAA GAT GTG GAG CTG GGG TCC CTG CAA GTC ATG AAC AAA ACG AGA AAG ATT ATG GAA 213 

KGGA TFIHAFVTTPM C C P S R 91 

CAT GGG GOG GCC ACC TIC ATC AAT GCC TTT GTG ACT ACA CCC ATG TGC TGC CCG TCA OGG 273 

S S M L T G X Y V H K H H V Y T N « B N 111 

TCC TCC ATG .CTC ACC GGG AAG TAT GTG CAC AST CAC AAT GTC TAC ACC AAC AAC GAG AAC 333 

e s a * a w Q A u a: s .* r t p a v .y i. n iii 

TGC TCT TCC CCC TCG TGG CAG GCC ATG CAT GAG OCT CGG ACT TTT GCT GTA TAT CTT AAC 393 

N T C Y R T A F F . G X Y T. M TS JL N G 3 Y 151 

AAC ACT GGC TAC AGA ACA GCC TTT TIT GGA AAA TAC CTC AAT GAA TAT AAT GGC AGO TAC 453 

I P P O W REWIi GIiIKHSR F Y H Y 171 

ATC CCC OCT GGG TGG OSA GAA TGG CTT GGA TTA ATC AAG AAT TCT CGC TTC TAT AAT TAC 513 

TVCRNGIKEKHGFDYA K P Y F 191 

ACT GTT TCT CGC AAT GGC ATC AAA GAA AAG CAT GGA TTT GAT TAT GCA AAG GAC TAC TTC 573 

T D L ITNE SINYFKMSK RKY P 211 

ACA GAC TTA ATC ACT ASC GAG AGC ATT AAT TAC TTC AAA ATG TCT AAG AGA ATG TAT CCC 633 

KRPVMKVISHA A P H G P ED S A 231 

CAT AGO COS GTT ATG ATG GTG ATC AGC CAC GCT GOG CCC CAC GGC CCC GAG GAC TCA GCC 693 

PQFSXIrYPNAS QHITPSYKY 251 

OCA CAG TTT TCT AAA CTG- TAC CCC AAT GCT TCC CAA CAC ATA ACT CCT ACT TAT ARC TAT 753 

APNMDKHWIMQYTGPHI.PIH 2™ 

OCA OCA AAT ATG CAT AAA CAC TGG ATT ATG CAG TAC ACA GGA OCA ATG CTG CCC ATC CAC 813 

M E F T K I X. Q R K R I» O T I> K S V D D 291 

ATS GAA TTT ACA AAC ATT CTA CAG. CGC AAA fiQB CTC CAG ACT TTG ATG TCA GTG GAT GAT 873 

S VERtYHH I,.VB TGEI.EKTYX 311 
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TCT GTG GAG H33 CTG TAT ARC ATG CTC GTS GAG AOS GQS SAG CTG OSS ART' ACT TAC ATC 



ATT TAC ACC GCC OK CAT GST TAC CAT A 



_2_ 



A CTG CTC AAG GGG AAA TCC 993 



AT3 CCA TAT GAC TIT GAT ATT OST GTG CCT TIT WT ATT CGT GST CCA ACT GTA CAA CCA 10S3 

GSIVPOIVL H IDLAPTILDI 371 

GGA TCA ATA GTC CCA CAG ATC GTF CTC AAC ATT GAC TTG <KC CCC SCO ATC CTG GAT ATT 1113 

■ A 11 D * » «? D V D G K S V h K I, h V V 391 

OCT GGG CTC GAC ACA CCT CCT GAT GTG GAC GGC AAG TCT GTC CTC AAA CTT CTG GAC CCA 1173 

KKPGNHFRTNKKAXIHR dtp 411 

GAA AAG CCA GGT AAC AGG TTT CGA ACA AAC AAG AAG GCC AAA ATT T3G CGT GAT ACA TTC 1233 

«IA, 8 G K P 1 8 11 K «8 S S K » I Q C 431 

CCA GTG GAA ASA GQC AAA TTT CTA CGT AAG AAG GAA GAA TCC AGC AAG AAT ATC CAA CAG 1293 



TCA AAT CAC TTG CCC AAA TAT G 



C AAA GAA CTA TGC CAG CAG GCC AGG TAC CAG 1353 



AGA GCC TGT GAA CAA COG GOG CAG AAG TGG CAA TGC ATT GAG GAT ACA TCP GGC AAG CTT 3.43,3 

R X. H * . C X G ,8- $ B J, h T V R. ,<J «- T R fj 491 

CGA ATT CAC AAG TGT AAA GGA CCC AST QAC CTG CTC ACA GTC CGS CAG AGC ACG CGG ARC 1473 

X, Y A *' G ■ ...» » p K b : R: C. S £: R fi S G y 511 

CTC TAC GCT CGC GGC TTC CAT GAC AAA GAC AAA GAG TGC AGT TGT AGG GAG TCT GST TAG 1S33 

* A S R S Q R K S Q R Q F I* R N Q G T J? 531 

CGT GCC AGO AGA AGC CAA AGA AAG AST CAA CGG CAA TTC TTG AGA AAC CAG GGG ACTCCA 1593 

R Y R r R -f .V » T R C T R S t, S V E F E 551 

ASG TAC AAG CCC AGA TTT GTC CAT ACT CGG CAG ACA CGT TCC TTG TOC GTC GAA TTT GAA 1653 



T GAA ATA TAT GAC ATA AAT CTG GAA GAA GAA GAA GAA T 
I A K R H D 



3 TTG CAA CCA AGA 1713 



li R G P R I) ti ft A S S 591 

AAC ATT GCT AAG CGT CAT GAT GAA GGC CAC AAG GGG CCA AGA GAT CTC CAG GCT TCC ACT 1773 

G G N R G R K t ■ A D S S N A V G P P T T 611 

GGT GGC AAC AGG GGC AGG ATG CTG GCA GAT AGC AGC AAC GCC GTG GGC CCA CCT ACC ACT 1033 

V RVTHKCFtr.l?HDSIHCERE 631 

GTC CGA GTG ACA CAC AAG TGT TTT ATT CTT CCC AAT GAC TCT ATC CAT TGT GAG AGA GAA 1893 

r LJ Q£KR A H R D H K AYI DKEIE 6S1 

CTG TAC CAA TCG GCC AGA GOG TGG AAG GAC CAT AAG GCA TAC ATT GAC AAA GAG ATT GAA 19 S3 

A I> O D KIRN^REVR GHLK R R K 671 

GCT CTG CAA GAT AAA ATT AAG AAT TTA AGA GAA GTG AGA GGA CAT CTG AAG AGA AGG AAG 2013 

PEECSCSKQS YY NKEKGVKK 691 

CCT GAG GAA TGT ACC TGC ACT AAA CAA AGC TAT TAC AAT AAA GAG AAA GOT GTA AAA AAG 2073 

J E K X>K SHtHPFKEAAQBVDS 711 

CAA GAG AAA TTA AAG AGC CAT CTT CAC CCA TTC AAG GAG GCT GCT CAG GAA GTA GAT AGC 2133 
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■ " « i- y W C L R ■ 
*C AAT AAC ACC TAC TGG TGT TTO CGT 1 



JL w E Y H N g £ e e. 

QTT AAT GAG ACQ CAT AAT TTT CTT 1TC ICT 



»» * ** * ^nSgWm ATO AW^CAkclrW^; aL 

AAT ACA GTG CAC A03 GTA GAA CGft CttH Mn« " Q £ H V 0 *, M E 

«i OTA GAA CGA GGC MX Tjfti AAT CAG CTA CAC Gift CAA CIA ATG G A G 



S R S C Q S X 
CXCVAQAAGCTGTCAAGG5-TA5 



'^AACTACAG^ 

^XCACTATGAC^TAAAACAAATAAX^^^ 
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Analysis of 23553 (871 aa) 
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Prosite Pattern Matches tot 2 

rcasKe.Vettioa: Rclcue ttl trt Fcbrtnrr l»s 

>KaiEQl|lTOCO0001|JlW7_Ct,Ya>SYlATI<W K-*J 
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►£ayaaa|K>oc«06(>7tTVB_Piwsr«o^rrE Tyrosine kinase pi 
Query: 637 
>ESSJ!!!£a.|n»o 



>ISQ2S21| FK>C001L7 1 £OLFftTASE_l Sulfatasea m: 
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Input file Fbh2S278FU.seqr; Output File 25278. ti 



CCACGCGTO 



oggcgcagggoctgo^^ 
<3cagatccgc<xx»d3ccgtccc<^^ 

AGCATCCGAC 



T I. TGPSLV 
ACC CTC Ml GGC TTC TCT CTG GTC f 



GGC TAC CTC TCC t 



3 GAC T3G GCC AAG COG i 



GGC GAG C 



3 CCC TCG GCC GCT CCG C 



C CTC CTC AGC TIC 

PVADGPGEA 
C TTC GTC GCC GAC GGG CCC GGG GAG GCT 

■QPPHIXFXL*» 
C CAQ OCT CCC CAC ATC AXC TTC ATC CTC AOS GAC 

BO.GYHDVGYHGSDI S T P T I, D 
GAC CAR GGC TAC CAC GAC GTG GGC TAC CAT GOT TCA GAT ATC GAG ACC OCT AOS CTO GAC 

RtAAKGVKLENYYIOPICTP 
AGG CTG GCO GCC AAG GGG GTC AAG TTC GAG AAT TAT TAC ATC CAG CCC ATC TGc" AOg'cCT 

SRSQI.1,10 RYQItfTGLC " 

fes cosysc cao ere 6rc act ooc agg tac cag ato cac aca gga ctc cag c 



RPQCPNCLPLD 
C CGC CCA CAC CAC CCC AAC TCC CTC CCC CTG <3AC < 



T TCC ATC 

! V 1 X. P 5 K L 
G GTG ACA CTG CCA CAG AAG CTG 



T H K 



GAG TGT CTG CCC ACC CGT CGG GGC TTC GAC ACC TTC CTG 

DYYTYDNCDftPnw 
GAC TAT TAC AOC TAT GAC ARC TGT 



CAT ATG GTG GGC ASG TCG CAC CTG GGC TTC TAC CGG AAG 

S I, T G K V 
TCG CTC ACG GGC A&T GTG 



IT GGC CCA G 



IT- GAG AAT GTG GCC TGG GGG C 



TGC GGC TTC GAC CTG CAC GAG 58S 
Y A 



O CTC AGC GGC CAG TAC TCC ACT AtVctt'tAC GCC CAG CGC 

A.SIirtASHSPQRPL, g L Y V A. F, 
OCOAGCCATATCCTCGCCAGCCACAGCCCTCi^OST CCC CTC TTC CTC TAT GTG GCC TTC 

QAVHTPLO.S*RB'Tt,YttYR<rK 
CMS CCA GTA CAC ACA CCC CTG CAG TCC OCT CGT GAG TAC CTG TAC CGC TAC CGC ACC ATG 

6HVARRKYAAMVTr>Mnir» tr r> 
GGC AAT GTG C 



8 CGG AAG TAC GCG GCC ATG GTG ACC TGC ATC GAT GAG GCT GTG CGC 825 



I T 



K R Y G F Y 



H S 



I' I P S 



ARC ATC ACC TCG GCC CTC AAG CGC TAC GGT TTC TAC AAC AAC AGT GTC ATC ATC TTC TCC 

« D M ° G 0 » P S G G S KWPLKGK K 
ACT GAC AAT GGT GGC CAG ACT TTC TCG GGG GGC AGC AAC TCG CCG CTC CGA GGA CGC AAG 



GGC ACT TAT TCG GAA GGT GGC GTO CGG C 



« CAC AGT. CCC CTO CTC AAG 1005 
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ocsv asg CAA OSS A 



i«»CG3oacRSMecscAicrasiici 



GOT CTG GCA GGT GOT ACC TCA GCA GCC GAT GGG CTA GAT G3C TAC GAC GTC TCG CCG 

GCC ATC AGO GAG GGC CGG GCC TCA CCA CGC AOG GAG ATC CTG CAC AAC ATT <J ccCA CTC 
JT * H A Q K G S £ E O G P G I W H * * « 
SAC AAC CAT GCC CAG CAT GGC TCC CTG GAG GGC GGC TTT GGG ATC TGG AAC ACC GCC GTG 



S OCT GCC ATC C 



2 GTG GGT GAG T 



fflSG ATC OCA CCG CAG ACA CTG GCC ACC TTC COG < 

A * V r q a v w r. v m 

GCC AGT GTC CGC CAG GCC GTG 1GG CTC TTC AAC J 

DI.AGQRPDVVR 
GAC CTG GCT G3C CAG CQ3 OCT GAT GTG GTC CGC t 



G ACA GGA GAC CCC GGC TAT GGC GAT 

'S WWNLERM 
TAGCTGGTGGAACCTGGAACGAATG 

SADPY ERs 
C AST GCT GAC CCT TAT GAA CG3 GAG 

* a R L & E Y 

C CTG CTG GCT CGC CTG GCC GAA TAT I 



■R8C OSC ACA GCC ATC COG GTA CGC TAC CCA GCT GAG AAC CCC CGG GCT CAT CCT GAC TTT 

* G G A W G P WASDEEEEFP*«„ 
A3&T *GQ6 GOT OCT TGG GGG COC TGG GCC AGT GAT GAG GAA GAG GAG GAA GAG GAA GGG AGG 

A . ■•* S 3? S R G R R K K K C K 1 C It T » o 
GCT-CGA AGC TTC TCC COG GGT COT CGC AAG AAA AAA TGC ASG ATP TGC AAG CTT CGA TCC 

W « CGT AAA CTC AAC ACC AGG CTA A' 

GCTGTTICTCAQQGAGAft 




^TTAACAOSGGiATCCCTG 
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Analysis of 25278 (569 aa) 



SS?Ef942 776* S4324I4I£fr? 4828322261 23333257 
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26212 seqs 



DMA sequence {nt 706-2118 coding) 

CRCGCS^CCGCCeACGCGTCCGTGGAGRTATTAACTTTTTTCTTTTTTTTCTTCCTTGGTGGAAGCTGCTCTAGGGAGSGGGGAGGAGGA 

GGAGAAAGTGAAATGTGCTGGAGAAGAGCGAGCCC'TCCTTGTTC'PTCCGGAGTCCCATCCATrAAGCCATCACTTCTGGAAGATTAAAGT 

TGTCGGACATGGTGACAGCTGAGAGGAGAGGAGGATTTCTTGCCAGGTGGAGAGTCTiCACCGTCTGTTGGGTGCATGTGTGCGCCCGCA 

GCGGCGCGGGGCGCGTGGTTCTCCGCGTGGAGTCTCACCTGGGACCTGAGTGAATGGCTCCCAGGGGCTGTGCGGGGCAICCGCCTCCGC 

CTTCTCCACAGGCCIGTGrCTGTCCTGGAAAGATGCTAGCAATGGGGGCGCTGGCAGGATTCTGGATCCTCTGCCTCCTCACTTATGGTT 

ACCTGTCCTGGGGCC^GGCCTTAGAAGAGGAGGAAGAAGGGGCCTTACTAGCTCAAGCTGGAGAGAAACTAGAGCCCAGCACAACrrCCA 

CCTCCCAGCCCCATCTCAT'H'TCATCCTAGCGGATGATCAGGGATTTAGAGATGTGGGTTACCACGGATCTGAGATTAAARCACCTACTC 

TTGAOUiGCrCGCrGCCGARGGAGTXAAACTGGAGARCTACTATCTCCAGCCrAXTTGCACACCATCCAGGAGTCAGrTTATTACrGGAA 

AGTATCAGATACACACCGGACTTCAfiCATTCTATCATAAGACCTACCCASCCCAACTGTTTACCTCTGGACAATGCCACCC'TACCrCAGA 

AACTGAAGGAGGTTGGATATTCAACGCATATGGTCGGAAAA2GGCACTTGGGTTTTTACAGAAAAGAATGCATGCCCACCAGAAGAGGAT 

TTGATACCTTTTTTGGTTCCCTSTTGGGAAGTGGGGATTAOTATACACAGTACAAATGTGACAGTCCTGGGATGTGTGGCTATGACTTGT 

ATGAAAACGACAATGCTGCCTGGGACTArGACAATGGCATATACTCCACACAGATGTACACTCAGAGAGTACAGCAAATCTTAGCTTCCC 

ATAACCCCACAAAGCCTATATTTTTATATATTGCCTATCAAGCTGTTCATTCACCACTGCAAGCTCCTGGCAGGTATTTCGAACACTACC 

GATCCATTATCAACATAAACAGGAGGAGATATGCTGCCATGCTTTCCTGCTTAGATGAAGCAATCAACAACGTGACATTGGCTCTA 

CTTATGGTXTCTATAACAACAGCATTATCATTTACTCTTCaGATAATGGTGGCCAGCCTACGGCAGGAGGGAGTAACTGGCCTCTCAGAG 

GTAGCAAAGGAACATATTGGGAAGGAGGGAtCCGGGCTGTAGGCXTTGTGCATAG^^ 

AACn^TGCACATCACTGACTGGTACCCCaCTCTCATTTCACTGGCTGAAGGACAGATTGATGAGGACAT 

TCTGGGAGACCATAAGTGAGGGTCTTCGCTCACCCCGAGTAGATAT1T1GCATAACATTGACCCCATATACACCAAGGGAAAAAATGGCT 
CCTGGGCAGCAGGraASGGGATCTGGAACACTGCAATCaAGTCAGCCATCAGAGTGCAGCACTGGAAATTGCTTACAGGAAATCCTGGCT 
ACAGCGACTGGGTCCCCCCTCAGTCTTTCAGCAACCTGGGACCGAACCGGTGGCACAATGAACGGATCACCTTGTCAACTGGCAAAAGTG 
TATGGCTTraCAACmTCAC»GCCGACCCATATGAGAGGGTO^ 

TCTCACAGTTCAAGAAAAGTGCAGTGCCGGTCAGGTATCCCCCCAAAGACCCCAGAAGTAACCCTAGGCTCAATGGAGGGGTCTGGGGRC 

CATGGTATAAAGAGGAAACCAAGAAAAAGAAGCCAAGCAAAMT^^ 

AGC»GAAAGCAGTCTC&GCTTCAACTTGCCATTCA^ 

TCTTATCTTTCATCTGTTTCCTAGGTAAA^ 



Protein sequence 

b»PRGCAGHPEPPSPQACVCPGKKLAMGA^GFWIU:i,^ 
VGYH6SEIKTPTLDKI«AAEGVKLENYYVQPICTPSRSQFITGKYQIHTC 
FYRKECMPTRRGFDTFFGSM^SGDYYTHYKCDSPGMC^YDLYEKOT^ 
PW^GRYFEEYRSnNItmRRYAAMLSCtDEAINW^ 

SPLLKKKGTVCKELVHITDWYPTlISLAEGQlDEDrQLDGYDIWETISEGLRSPRVDIl^NIDPIYTKAKNGSWAAGYGIWNTAIQSAIR 
VQHWKLLTGNPGYSDWPPQSFSNLGPNRHHtffiRirLSTGKSVHiraiTADPYERVDLSNRYPGIVKKLLRRLSQFNK^AVPVRYPPKDP 
RSNPRLNGGWGPVfYKEETKKKKPSKKQAEKSQKKSKKKKKKQQKAVSGSTCHSGVTCG 
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Prosite Pattern Matches for. 26212 

Prosite version: Release 12.2 of February 1995 

>PS0000l|PDOC0000l|ASN_GLYC»SYIATION N-glycosylation site. 



Query: 


157 


NATL 


160 




Query: 


306 


NVTIr 


309 




Query: 


318 


NNSI 


321 




Query: 


431 


NGSW 


434 




Query: 


497 


NITA 


500 




Query: 


527 


HKTA 


530 




>PS0G004|PDOC00004 | CAMI 
dependent protein kinas 


»_PH0SI 
se phoE 


>HO SITE cAMP- and cGMP- 
sphorylation site. 


Query: 


521 


RRLS 


524 




Query: 


562 


KKPS 


565 




>PS00005 | PDOC00005 | PKC 
phosphorylation site. 


PHOSPE 


I0_SITB Protein kinase C 


Query: 


131 


TGK 


133 




Query: 


189 


TRR 


191 




Query: 


243 


TQR 


245 




Query: 


413 


SPR 


415 




Query: 


489 


TGK 


491 




Query: 


509 


SNR 


511 




Query: 


559 


TKK 


561 




Query: 


576 


SKK 


578 





>PS00006|PDOC00006|CK2_PHOSPHO SITE Casein kinase II 
phosphorylation site. ~ 



FIGURE 18A 
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Query: 


298 


SOLD 


301 


Query: 


347 


TYWE 


350 


Query: 


386 


SLAE 


389 


Query: 


406 


TISE 


409 



>PS00007|PDOC00007|TYR_PHOSPHO_SITE Tyrosine kinase 
phosphorylation site. 



Query: 163 KLKEVGY 169 



>PS00008|PDOC00008|KYRISTYL N-myristoylation site. 



Query: 


28 


GALAGE 


33 


Query: 


56 


GALLAQ 


61 


Query: 


139 


GLQHSI 


144 


Query: 


198 


GSLLGS 


203 


Query: 


235 


GIYSTQ 


240 


Query: 


329 


GGQPTA 


334 


Query: 


343 


GSKGTY 


348 


Query: 


351 


GGIRAV 


356 


Query: 


432 


GSWAAG 


437 


Query: 


439 


GIWNTA 


444 



>PS00149|PDOC00117|SULFATASE_2 Sulfatases signature 2. 
Query: 168 GYSTHMVGKW 177 

>PS00523|PDOCG0117|SULFATASE_1 Sulfatases signature 1. 
Query: 120 PICTPSRSQFITG 132 

FIGURE 18B 
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FIGURE 19 
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FIGURE 20 
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30/44 
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FIGURE 27 
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NDR19 
MDA138 
NDR01 


fiwastT 
BreastN 
Breast T 


NDR15 
NDR 133 


Breast T 
Breast T 


MDA161 


BreastT 


MDA155 


Breast T 


PJT270 


LungN 


CHT427 


LungN 


PIT 241 


lungN 


PIT 298 


UtngN 


CHT800 


Lung T 


CKT335 




CHT447 


tMngT 


CHT752 


UmgT 


CHT799 


LungT 


CHT813 


LungT 


CHT369 


LungT 


CHT371 


CotonN 


CHT396 


CotonN 


CHT398 


Colon N 


NOR 104 
CHTSZO 


CotonN 
Colon T 


CHT122 


Colon T 


CHT536 


Colon T 




DCJS 47.84 

Normal 52.89 

tDC 44.79 

DC 29.55 

ILC 43.26 

IDC 60.13 

IDC/DCIS 20.11 

Norma! 36.00 

Normal 26.54 

Norma! 31.45 

Normal 17.57 

AC 31.46 

SCC 35.02 

WS 27.19 

AC 3*9 

AC 5.74 

SCC 47.18 

SCC 42.37 

Normal 2.37 

Normal 16.34 

Normal 1554 

Normal 20.89 

Adeno 11.71 

Adeno 360.79 

Adeno loo 
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CHT528 CotariT 

CHT388 Colon T 

CHT372 Colon T 

CHT532 Colon T 

CHT7T Liver Met 

CHT321 Liver Met 

CUT 84 Liver Mel 

NDR 100 Liver Met 

NOR 154 UverN 

CHT3Z2 UverN 

PIT 51 UverN 

CUT 339 Uver 




Adeno 11.63 

Adeno 872^2 

Adeno 2.39 

Adeno 4.45 

Met 23.43 

Mel 11.35 

Met 46.21 

Normal 7.31 

Normal 9.30 

Normal 1.77 

Normal 1.58 



PIT 265 Breast N Normal 37.40 

MDA335 Breast N Normal 4557 

NOR 132 BreestT OCIS 10.56 

NDR 13 Breast N Normal 6.73 

NDR 56 Breast N Normal 20.61 



FIGURE 28B 
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unents of top-scoring domains: 

;Caee: domain l of 1, from 76 to 502s score 324.5, E - 1.3e-<93 

*->PHvHilaDDlQigdlgcyghptirTPnldrLReeQlrPtnhytatp 
P+ ++ilaDD+G+ d+g++g ++i TP+ld+i«A+eG+++ n+y+ +p 
26212 76 PIILIPIIJU)DQGFiU3VQYHQ-SBIKTPTIJ5KLAAEGVKIB!mV-QP 120 

ICsPSSAaLlTSryphrhGnrvBngrlgvlgftaksgglpldettLpellik 
+C+PSR+++ TG+y+f-t+G + + + ++ +lpld +tto+ Iik 

262 12 121 rCTPSRSQFITGKYQIHTGLQH SIIRPTQPKCLP1.0NATI.PQKLK 165 

eaGYaTglvGKWElglnensdaagdgehlPlgvn^3fdy£dgflygspfty 
e GY T++vGX«Elg++4- +e+ P++ rGfd f+g 14-gs ++y 

26212 166 BVGYSTHMVGKHEIiGFyR KECMPTR-RGFDTPFGSIjLGSQDYY 207 

deencdngegteppeaypeqgwipgilgyyltdlladkalglldvasaag 
++ cd 4-p+ aa 
2S212 208 THYKCD SPGM - CGVPLYEsTONAA- 229 

rllakalaasrPFflyisppaphfsilfrnfkevaqpyrapgltqlfvde 
++++ + ++tg+++++ 

26212 230 WD YD - -NGIYSTQMYTQR 245 

aadfiernk.ekPfflylaflrlhvhtplfspaedleskdflgrsqrgrY 
++++++ kP fly a++ +vh pj++p + e+++ r+rJT 
26212 246 VQQILASHNpTKPIFLYIAYQ — AVHSPLQAPGRYFEHYRSIININRRRY 293 

gdlveeTODdlvGrvldaledlGlldlilTlvifTSDnGahlegtpewygggn 
D+++++V aL+ G ++» ++i*-+SDnG g+p+ +gg+n 
26212 294 AAMLSOjDBA IKK VTLAliKTYGFX NHS 1 i I YSSDNG GQPT-AGGSN 338 

gplkggKgygslyeGgiRvPllvxwPggiapagrvkekselvslivDlaPT 
+pl+g Kg+ +eGgiR ++V++P + +g+v + elv++ D++PT 
26212 339 WPLRGSKGTY- -KEGGIPAVGFVHSP-IiKKKGTVCK- -ELVHITDWYPT 383 

ildlAGaplPkvanGakdrplDGvsllplllggaapsrrahetlfhyngk 
+ +1A + +♦ d 1DG++++ + +g + s+ + +++h+ 
26212 384 LISLAEGQ1DE CIQIDGYOIWBTISEGLR-SP- -RVDILHH 421 

grklravrwprksgktpklkahff tpaf 

++ ++ +k+ + + a + ++ ++ + ++ + +++++ ++ 
26212 422 "---IDPIYTKAKN---GSMARGYGlWraiqsairvqhwklltgnpgyed 465 

dddtnngwecvgtvsqaddiedcrcegvetvthhdppelyDlsrDP 

++++ n+g + ++ e t+ + +1++ ++DP 

26212 466 wvppQSPeSLG— — 
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FIGURE 30 
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FIGURE 31A 
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26212.1 Expression in Oncology Plate II 

8.97 































"Ml 





FIGURE 31 B 
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26212,1 Expression in Clinical Breast Samples 

0.50 -t — — , _ , 

0.43 ! 
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26212.1 Expression in Clinical Lung Samples 
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FIGURE 33 
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FIGURE 34 
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FIGURE 35 
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FIGURE 36 
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SEQUENCE LISTING 

<110> Glucksman, Maria Alexandra 
Williamson, Mark; 
Tsia, Fong-Ying 
Rudolph-Owen, Laura A. 

<120> 22438, 23553, 25278, and 26212 trowel 
Human Sulfatases {A CIP Application) 



<130> 35800/208709 

<151> 

<160> 14 

<170> FastSEQ for Windows Version 4.0 

<210> 1 
<211> 525 
<212> PRT 

<213> homo sapiens 



<400> 1 



Met Gly 


m 


T 


Leu Lys Val Leu Leu 






Phe Ser 


1: 






5 






• • 


15 


Gly Phe 






Leu Val Asp Phe Cys 




Gly Lys 












25 








Gly Gin 


Lys 


Pro Asn 


Phe Val He lie Leu 


Ala Asp 


Asp Met 


Gly Trp 






35 




40 




45 




Gly 


Asp 


Leu 


Gly Ala 


Asn Trp Ala Glu Thr 


Lys Asp 


Thr Ala 


Asn Leu 




50 






55 


SO 






Asp 


Lys 


Met 


Ala Ser 


Glu Gly Met Arg Phe 


Val Asp 


Phe His 


Ala Ala 


65 








70 


75 




80: 


Ala 


Ser 


Thr 


Cys Ser 


Pro Ser Arg Ala Ser 


Leu Leu 


Thr Gly 


Arg Leu 








' 85 


30 






95 


Gly 


Leu 


Arg 


Asn Gly 


Val Thr Arg Asn Phe 


Ala Val 


Thr Ser 


Val Gly 








100 


105 




110 


Gly 


Leu 


Pro 


Leu Asn 


Glu Thr Thr Leu Ala 


Glu Val 


Leu Gin 


Gin Ala 






115 




•120 




125 




Gly 


Tyr 


Val 


Thr Gly 


He He Gly Lys Trp 


His Leu 


Gly Kis 


His Gly 




130 






,135 ; 


140 




Ser 


Tyr 


His 


Pro Asn 


Phe Arg Gly Phe Asp 


Tyr Tyr 


Phe Gly 


He Pro 


145 








150 


155 




160 


Tyr 


Ser 


His 


Asp Met 


Gly Cys Thr Asp Thr 


Pro Gly 


Tyr Asn 


His Pro 








165 


170 






175 


Pro 


Cys 


Pro 


Ala Cys 


Pro Gin Gly Asp Gly 




Arg Asn 


Leu Gin 








180 


185 




190 




Arg 


Asp 




Tyr Thr 


Asp Val Ala Leu Pro 


Leu Tyr 


Glu Asn 


Leu Asn 






195 




200 




205 




lie 


Val 


Glu 


Gin Pro 


Val Asn Leu Ser Ser 


Leu Ala 


Gin Lys 


Tyr Ala 




210 






215 


220 


Glu 


Lys 


Ala 


Thr Gin 


Phe lie Gin Arg Ala 


Ser Thr 


Ser Gly 


Arg Pro 


225 








230 


235 




240 


Phe 


Leu 


Leu 


Tyr Val 


Ala Leu Ala His Met 


His Val 


Pro Leu 


Pro Val 








245 


250 






255 


Thr 


Gin 




Pro Ala 


Ala Pro Arg Gly Arg 


Ser Leu 


Tyr Gly 


Ala Gly 








260 


265 




270 




Leu 


Trp 


Glu 


Met Asp 


Ser Leu Val Gly Gin 


He Lys 


Asp Lys 


Val Asp 






275 




280 




285 


Kis 


Thr 


Val 


Lys Glu 


Asn Thr Phe Leu Trp 


Phe Thr 


Gly Asp 


Asn Gly 




290 






295 


300 






Trp 


Ala 


Gin Lys 


Cys Glu Leu Ala Gly 


Ser Val 


Gly Pro 


Phe Thr 


305 








310 


315 




320 


Gly 


Phe 


Trp 


Gin Thr 


Arg Gin Gly Gly Ser 


Pro Ala 


Lys Gin 


Thr Thr 
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325 330 335 

Trp Glu Gly Gly His Arg Val Pro Ala Leu Ala Tyr Trp Pro Sly Arg 

340 345; 350 

Val Pro Val A sn Val Thr Ser Thr Ala Leu Leu Ser Val Leu Asp lie 

355 360 365 

Pile Pro Thr Val Val Ala Leu Ala Gin Ala Ser Leu Pro Gin Gly Arg 

370 375 380 

Arg Phe Asp Gly Val Asp Val Ser Glu Val Leu Phe Gly Arg Ser Gin 
385 390 395 400 

Pro Gly His Arg Val Leu Phe His Pro Asn Ser Gly Ala Ala Gly Glu 

405 410 415 

Phe Gly Ala Leu C-ln Thr Val Arg Leu Glu Arg Tyr Lys Ala Phe Tyr 

420 425 430 

He Thr Gly Gly Ala Arg Ala Cys Asp Gly Ser Thr Gly Pro Glu Leu 

435 440 445 

Gin His Lys Phe Pro Leu lie Phe Asn Leu Glu Asp Asp Thr Ala Glu 

450 455 460 

Ala Val Pro Leu Glu Arg Gly Gly Ala Glu Tyr Gin Ala Val Leu Pro 
465 470 475 : 490 

Glu Val Arg Lys Val Leu Ala Asp Val Leu Gin Asp He Ala Asn Asp 

485 m 495 

Asn He Ser Ser Ala Asp Tyr Thr Gin Asp pro Ser Val Thr Pro Cys 

500 • 505 sio 

Cys Asn Pro Tyr Gin He Ala Cys Arg Cys Gin Ala Ala 
515 520 525 

<210> 2 

<211> 2175 

<212> DKA 

<213> homo sapiens 

<220> 
<221> CDS 

<222> (248).,. (1825) 
<4Qg> 2 

caegegtccg caaatttcct gattcttttg aattaggatt ccagatgggg gcctcatttc 
tacagccccc aacattccta tagccgttat cactgccatc accactgcca ccagcatctt 
cttgcagatt ccacccctgc tccccagaga cttcctgctt tgaaagtgag cagaaaggaa 
gctctcagaa aaatctctag tggtggctgc cgtcgctcca gacaatcgga atcctgcctt 
caecacc atg ggc tgg ctt ttt eta aag gtt ttg ttg gcg gga gtg agt 
Met Gly Trp Leu Phe Leu Lys Val Leu Leu Ala Gly Val Ser 
15 10 

ttc tea gga ttt ctt tat cct ctt gtg g at ttt tgc ate agt ggg aaa 
Phe Ser Gly Phe Leu Tyr Pro Leu Val Asp Phe Cys He Ser Gly Lys 
15 20 25 |o 

aca aga gga cag aag cca aac ttt gtg att att ttg gee gat gac ata 
Thr Arg Gly Gin Lys Pro Asn Phe Val He He Leu Ala Asp Asp Met 
35 40 45 

f, 99 ¥ g If gaC Ctg 9Sfa sca aac «*# 9ca gaa aca aag gac act gec 
Gly Trp Gly Asp Leu Gly Ala Asn Trp Ala Glu Thr Lys Asp Thr Ala 
50 55 go 

aac ctt gat aag atg get teg gag gga atg agg ttt gtg gat ttc cat 
Asn Leu Asp Lys Met Ala Ser Glu Gly Met Arg Phe Val Asp Phe His 
65 70 75 

gca get gee tec acc tgc tea ccc tec egg get tec ttg etc acc ggc 
Ala Ala Ala Ser Thr Cys Ser Pro Ser Arg Ala Ser Leu Leu Thr Gly 
»0 85 go 

egg ctt ggc ctt cgc aat gga gtc aca egs aac ttt gca gtc act tct 
Arg Leu Gly Leu Arg Asn Gly Val Thr Arg Asn Phe Ala Val Thr Ser 
95 100 105 11Q 

2 
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gtg gga ggc ctt ccg etc aac gag acc acc ttg gca gag gtg ctg cag 625 
Val Gly Gly Leu Pro Leu Asn Glu Thr Thr Leu Ala Glu Val Leu Gin 
115 120 125 

cag gcg ggt tac gtc act ggg ata ata ggc aaa tgg eat ctt gga cac 673 
Gin Ala Gly Tyr Val Thr Gly lie He Gly Lys Trp His Leu Gly His 
130 135 140 

cac ggc tct tat cac ccc aac ttc cgt ggt ttt gat tac tac ttt gga 721 
His Gly Ser Tyr His Pro Asn Phe Arg Gly Phe Asp Tyr Tyr Phe Gly 
145 150 155 

ate cca tat age cat gat atg ggc tgt act gat act cca ggc tic aac 769 
He Pro Tyr Ser His Asp Met Gly Cys Thr Asp Thr Pro Gly Tyr Asn 
16,0 165 170 

cac cct cct tgt cca gcg tgt cca cag ggt gat gga cca tea agg aac 817 
His Pro Fro Cys Pro Ala Cys Pro Gin Gly Asp Gly Pro Ser Arg Asn 
175 180 185 190 

ctt caa aga gac tgt tac act gac gtg gec etc cct ctt tat gaa aac 865 
Leu Gin Arg Asp Cys Tyr Thr Asp Val Ala Leu Pro Leu Tyr Glu Asn 
195 200 205 

etc aac att gtg gag cag ccg gtg aac ttg age age ctt gee cag aag 913 
Leu Asn He Val Glu Gin Pro Val Asn Leu Ser Ser Leu Ala Gin Lys 
210 215 220 

tat get gag aaa gca aec cag ttc ate cag cgt gca age acc age ggg 961 
Tyr Ala Glu Lys Ala Thr Gin Phe He Gin Arg Ala Ser Thr Ser Gly 
225 230 235 

agg ccc ttc ctg etc tat gtg get ctg gee cac atg cac gtg ccc tta 1009 
Arg Pro Phe Leu Leu Tyr Val Ala Leu Ala His Met His Val Pro Leu 

240 245 .2S0; 

ccc gtg act cag eta cca gca gcg cca egg ggc aga age ctg tat ggt 1057 
Pro Val Thr Gin leu Pro Ala Ala Pro Arg Gly Arg Ser Leu Tyr Gly 
255 260 265 270 

gca ggg etc tgg gag atg gac agt ctg gtg ggc cag ate aag gac aaa 1105 
Ala Gly Leu Trp Glu Met Asp Ser Leu Val Gly Gin He Lys Asp Lys 
275 280 285 

gtt gac cac aca gtg aag gaa aac aca ttc etc tgg ttt aca gga gac 1153 
Val Asp His Thr Val Lys Glu Asn Thr Phe Leu Trp Phe Thr Gly Asp 
290 295 300 

aat ggc ccg tgg get cag aag tgt gag eta gcg ggc agt gtg ggt ccc 1201 
Asn Gly Pro Trp Ala Gin Lys Cys Glu Leu Ala Gly Ser Val Gly Pro 
305 310 315 

ttc act gga ttt tgg caa act cgt caa ggg gga agt cca gec aag cag 1249 
Phe Thr Gly Phe Trp Gin Thr Arg Gin Gly Gly Ser Pro Ala Lys Gin 
320 325 330 

acg acc tgg gaa gga ggg cac egg gtc cca gca ctg get tac tgg cct 1297 
Thr Thr Trp Glu Gly Gly His Arg Val Pro Ala Leu Ala Tyr Trp Pro 
335 340 345 350 

ggc aga gtt cca gtt aat gtc acc age act gec ttg tta age gtg ctg 1345 
Gly Arg val Pro Val Asn Val Thr Ser Thr Ala Leu Leu Ser Val Leu 
355 360 365 

gac att ttt cca act gtg gta gec ctg gee cag gee age tta cct caa 1393 
Asp He Phe Pro Thr Val Val Ala Leu Ala Gin Ala Ser Leu Pro Gin 
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gga egg cgc ttt gat ggt gtg gac gtc tec gag gtg etc ttt ggc egg 
Gly Arg Arg Phe Asp Gly Val Asp Val Ser Glu Val Leu Phe Gly Arg 
385 390 395 



gga gag ttt gga gec ctg cag act gtc cgc ctg gag cgt tac aag gee 1537 
Gly Glu Phe Gly Ala Leu Gin. Thr Val Arg Leu Glu Arg Tyr Lys Ala 
415 420 425 430 

ttc tac att acc ggt gga gec agg gcg tgt gat ggg age acg ggg cct 1585 
Phe Tyr He Thr Gly Gly Ala Arg Ala Cys Asp Gly Ser Thr Gly Pro 
435 440 445 

gag ctg cag cat aag ttt cct ctg att ttc aac ctg gaa gac gat acc 1633 
Glu Leu Gin His Lys Phe Pro Leu He Phe Asn Leu Glu Asp Asp Thr 
450 455 460 

gca gaa get gtg ccc eta gaa aga ggt ggt gcg gag tac cag get gtg 1681 
Ala Glu Ala Val Pro Leu Glu Arg Gly Gly Ala Glu Tyr Gin Ala Val 
465 470 475 

ctg ccc gag gte aga aag gtt ctt gca gac gtc etc caa gac att gee 1729 
Leu Pro Glu Val Arg Lys Val Leu. Ala Asp Val Leu Gin Asp He Ala 
480 485 490 

aac gac aac ate tec age gca gat tac act cag gac cct tea gta act 1777 
Asn Asp Asn He Ser Ser Ala Asp Tyr Thr Gin Asp Pro Ser Val Thr 
WS 500 505 510 

ccc tgc tgt aat ccc tac caa att gec tgc cgc tgt caa gec gca taa 1825 
pro Cys Cys Asn Pro Tyr Gin He Ala Cys Arg Cys Gin Ala Ala * . 

515 520 525 

cagaccaatt tttattccac gaggaggagt acctggaaat taggcaagtt tgcttccaaa 1885 

tttcattttt accctcttta caaacacacg ctttagttta gtcttggagt ttagttttgg 1945 

agttagcctt gcatatccct tctgtatcct gtccctcctc cacgccgacc cgagagcagc 2005 

tgagctgege tggctctggg cagggagtgt gecttaatgg gaagcacacg ggctttggag 2065 

tcaggcacag gtgccagctc cagcttttga acttgggcaa ttgtttaacc taacctgcaa 2125 

gttgattttg agggttaaat aaaggcatac atgaaaaaaa aaaaaaaaaa 2175 

<210> 3 

<211> 871 

<212> PRT 

<213> homo sapiens 



Met 




Tyr 


Ser 


Cys 


Cys 


Ala 




Val 




Ala 


Val 


Leu 


Gly 


Thr 


Glu 


1 








s. 










10 










15 








Gly 


Ser 
20 


Leu 


Cys 


Ser 


Thr 


Val 
25 


Arg 


Ser 


Pro 




Phe 
30 




Giy 


Arg 


He 


Gin 
35 


Gin 


Glu 






40 


He 






Asn 


He 
45 


He 




Val 


Leu 


Thr 






Gin 




Val 


Glu 




Gly 






Gin 


Val 


Met 






50 










55 










60 










Lys 


Thr 


Arg 


Lys 


He 


Met 


Glu 


His 


Gly 


Gly 


Ala 


Thr 


Phe 


lie 




Ala 


65 










70 










75 










80 


Phe 


Val 


Thr 


Thr 


Pro 
85 


Met 


Cys 


Cys 


Pro 


Ser 

90 


Arg 




Ser 


Met 


95 


Thr 


Gly 




Tyr 


Val 
100 


His 




His 




Val 
105 




Thr 




Asn 


Glu 
110 




Cys 


Ser 




Pro 


Ser 


Trp 


Gin 


Ala 


Met 


His 


Glu 


Pro Arg 




Phe 


Ala 


Val 






115 










120 




4 






125 
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Tyr Leu Asn Asn Thr Gly Tyr Arg Thr AXa Phe Phe Gly Lys Tyr Leu 

130 135 140 

Asn Glu Tyr Asn Gly Ser Tyr lie Pro Pro Gly Trp Arg Glu Trp Leu 
145 150 155 160 

Gly Leu He Lys Asn Ser Arg Phe Tyr Asn Tyr Thr Val Cys Arg Asn 

165 170 175 

Gly He Lys Glu Lys His Gly Phe Asp Tyr Ala Lys Asp Tyr Phe Thr 

180 185 190 

Asp Leu He Thr Asn Glu Ser lie Asn Tyr Phe Lys Met Ser Lys Arg 

195 200 205 

Met Tyr Pro His Arg Pro Val Met Met Val He Ser His Ala Ala Pro 

210 215 220 

His Gly Pro Glu Asp Ser Ala Pro Gin Phe Ser Lys Leu Tyr Pro Asn 
225 230 235 240 

Ala Ser Gin His He Thr Pro Ser Tyr Asn Tyr Ala Pro Asn Met Asp 

245 250 255 

Lys His Trp He Met Gin Tyr Thr Gly Pro Met Leu Pro He His Met 

260 265 270 

Glu Phe Thr Asn He Leu Gin Arg Lys Arg Leu Gin Thr Leu Met Ser 

275 280 285 

Val Asp Asp Ser Val Glu Arg Leu Tyr Asn Met Leu Val Glu Thr Gly 

290 295 300 

Glu Leu Glu Asn Thx Tyr He He Tyr Thr Ala Asp His Gly Tyr His 
305 310 315 320 

He Gly Gin Phe Gly Leu Val Lys Gly Lys Ser Met Pro Tyr Asp Phe 

325 330 335 

Asp He Arg Val Pro Phe Phe He Arg Gly Pro Ser Val Glu Pro Gly 

340 .34S-; 350 

Ser He Val Pro Gin He Val Leu Asn He Asp Leu Ala Pro Thr He 

355 360 365 

Leu Asp He Ala Gly Leu Asp Thr Pro Pro Asp Val Asp Gly Lys Ser 

370 375 . 380 

Val Leu Lys Leu Leu Asp Pro Glu Lys Pro Gly Asn Arg Phe Arc Thr 
385 390 395 400 

Asn lys Lys Ala Lys He Trp Arg Asp Thr Phe Leu Val Glu Arc Gly 

"40St 410 415 

Lys Phe Leu Arg Lys Lys Glu Glu Ser Ser Lys Asn. He Gin Gin Ser 
420 425 430 



Arg Tyr Gin Thr Ala Cys Glu Gin Pro Gly Gin Lys Trp Gin Cys He 

450 455 460! , 

Glu Asp Thr Ser Gly Lys Leu Arg He His Lys Cys Lys Gly Pro Ser 
465 470 475 480 

Asp Leu Leu Thr Val Arg Gin Ser Thr Arg Asn Leu Tyr Ala Arg Gly 

485 490 495 

Phe His Asp Lys Asp Lys Glu Cys Ser Cys Arg Glu Ser Gly Tyr Arg 

500 505 510 

Ala Ser Arg Ser Gin Arg Lys Ser Gin Arg Gin Phe Leu Arg Asn Gin 

515 520 525 

Gly Thr Pro Lys Tyr Lys Pro Arg Phe Val His Thr Arg Gin Thr Arg 

530 535 540 

Ser Leu Ser Val Glu Phe Glu Gly Glu He Tyr Asp He Asn Leu Glu 
545 550 555 560 

Glu Glu Glu Glu Leu Gin Val Leu Gin Pro Arg Asn He Ala Lys Arg 

565 570 575 

His Asp Glu Gly His Lys Gly Pro Arg Asp Leu Gin Ala Ser Ser Gly 

580 585 590 

Gly Asn Arg Gly Arg Met Leu Ala Asp Ser Ser Asn Ala Val Gly Pro 

595 600 605 

Pro Thr Thr Val Arg Val Thr His Lys Cys Phe He Leu Pro Asn Asp 

610 615 620 

Ser He His Cys Glu Arg Glu Leu Tyr Gin Ser Ala Arg Ala Trp Lys 
625 630 635 640 

Asp His Lys Ala Tyr He Asp Lys Glu lie Glu Ala Leu Gin Asp Lys 

645 650 655 

He Lys Asn Leu Arg Glu Val Arg Gly His Leu Lys Arg Arg Lys Pro 
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660 6SS 670 

Glu Glu Cys Ser Cys Ser Lys Gin Ser Tyr Tyr Asn Lys Glu Lys Gly 

675 680 685 

Val Lys Lys Gin Glu Lys Leu Lys Ser His Leu His Pro Phe Lys Glu 

690 695 700 

Ala Ala Gin Glu VaX Asp Ser Lys Leu Gin Leu Phe Lvs Glu Asn Asn 
705 710 715 " 720 

Arg Arg Arg Lys Lys Glu Arg Lys Glu Lys Arg Arg Gin Arg Lys Gly 

725 730 735 

Glu Glu Cys Ser Leu Pro Gly Leu Thr Cys Phe Thr His Asp Asn Asn 

740 745 750 

His Trp Gin Thr Ala Pro Phe Trp Asn Leu Gly Ser Phe Cys Ala Cys 

755 760 765 

Thr Ser Ser Asn Asn Asn Thr Tyr Trp Cys Leu Arg Thr Val Asn Glu 

770 775 780 

Thr His Asn Phe Leu Phe Cys Glu Phe Ala Thr Gly Phe Leu Glu Tyr 
785 790 795 800 

Phe Asp Met Asn Thr Asp Pro Tyr Gin Leu Thr Asn Thr Val His Thr 

805 810 815 

Val Glu Arg Gly He Leu Asn Gin Leu His Val Gin Leu Met Glu Leu 

820 825 830 

Arg Ser Cys Gin Gly Tyr Lys Gin Cys Asn Pro Arg Pro Lys Asn Leu 

835 840 845 

Asp Val Gly Asn Lys Asp Gly Gly Ser Tyr Asp Leu His Arg Gly Gin 

850 855 860 

Leu Trp Asp Gly Trp Glu Gly 
865 870 

<210> 4 

<211> 4321 

<212 > VM 

<213> hO*q sapiens 

<220> 
<221> CDS 

<222> (510).... (3125) 
<400> 4 

cccacgcgtc cggctaatga atcttggggc cggtgtcggg ccggggcggc ttgatcggca 60 
actaggaaac cccaggcgca gaggccagga gcgagggcag cgaggatcag aggccaggcc 120 
ttcccggctg ccggcgctcc tcggaggtca gggcagatga ggaacatgac tctccccctt 180 
cggaggagga aggaagtccc gctgccacct tatctctgct cctctgcctc ctccctgtte 240 
ccagagcttt ttctctagag aagattttga aggcggcttt tgtgctgacg gccacccacc 300 
atcatctaaa gaagataaac ttggcaaatg acatgeaggt tcttcaaggc agaataattg 360 
cagaaaatct tcaaaggacc ctatotgcag atgttotgaa tacctctgag aatagagatt 420 
gattattcaa ccaggatacc taattcaaga actccagaaa tcaggagacg gagacatttt 480 
gtcagttttg caacattgga ccaaataoa atg aag tat tct tgc tgt get ctg 533 
&et Lys Tyr Ser Cys cys Ala Leu 
1 5 

gtt ttg get gtc ctg ggc aca gaa ttg ctg gga age etc tgt teg act 581 
Val Leu Ala Val Leu Gly Thr Glu Leu Leu Gly Ser Leu Cys Ser Thr 
10 15 20 

gtc aga tee ccg agg ttc aga gga egg ata cag cag gaa cga aaa aac 629 
Val Arg Ser Pro Arg Phe Arg Gly Arg He Gin Gin Glu Arg Lys Asn 
25 30 35 40 

ate cga ccc aac att att ctt gtg ctfc acc gat gat caa gat gtg gag 677 
He Arg Pro Asn lie He Leu Val Leu Thr Asp Asp Gin Asp Val Glu 
45 SO 55 

ctg ggg tec ctg caa gtc atg aac aaa acg aga aag att atg gaa cat 725 
Leu Gly Ser Leu Gin Val Met Asn Lys Thr Arg Lys He Met Glu His 
60 65 70 

ggg 333 gec acc ttc ate aat gec ttt gtg act aca ccc atg tgc tgc 773 

6 
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Gly Gly Ala Thr Phe lie Asn Ala Phe Val Thr Thr Pro Met Cvs Cvs 
75 80 85 

ccg tea egg tec tec atg etc . acc ggg aag tat gtg cac aat cac aat 
Pro Ser Arg Ser Ser Met Leu Thr Gly Lys Tyr Val His Asn His Asn 
90 95 10 0 

gtc tae acc aac aac gag aac tgc tct tec cce teg tgg cag gee atg 
Val Tyr Thr Asn Asn Glu Asn Cys Ser Ser Pro Ser Trp Gin Ala Met 
105 110 us 120 

cat gag cot egg act ttt get gta tat ctt aac aac act ggc tac aga 
His Glu Pro Arg Thr Phe Ala Val Tyr Leu Asn Asn Thr Gly Tyr Arg 
125 130 135 

aca gee ttt ttt gga aaa tac etc aat gaa tat aat ggc age tac ate 
Thr Ala Phe Phe Gly Lys Tyr Leu Asn Glu Tyr Asn Gly Ser Tyr He 
1*0 145 150 

cce cct ggg tgg cga gaa tgg ctt gga tta ate aag aat tct cgc ttc 
Pro Pro Gly Trp Arg Glu Trp Leu Gly Leu He Lys Asn Ser Arg Phe 
155 160 165 

tat aat tac act gtt tgt cgc aat ggc ate aaa gaa aag cat gga ttt 
Tyr Asn Tyr Thr Val Cys Arg Asn Gly He Lys Glu Lys His Gly Phe 
170 175 180 

gat tat gca aag gac tac ttc aca gac tta ate act aac gag age att 
Asp Tyr Ala Lys Asp Tyr Phe Thr Asp Leu He Thr Asn Glu Ser He 
185 190 195 2Q0 

aat tac ttc aaa atg tct aag aga atg tat cce cat agg cce gtt atg 
Asn Tyr Phe Lys Met Ser Lys Arg Met Tyr Pro His Arg Pro Val Met 
205 210 215 

Atg gtg ate age cac get gcg cce cac ggc cce gag gac tea gee cca 
Met Val He Ser His Ala Ala Pro His Gly Pro Glu Asp Ser Ala Pro 
220 225 230 

cag ttt tct aaa ctg tac cce aat got tec eaa cac ata act cct act 
Gin Phe Ser Lys Leu Tyr Pro Asn Ala Ser Gin His He Thr Pro Ser 
235 240 245 

tat aac tat gca cca aat atg gat aaa cac tgg att atg cag tac aca 
Tyr Asn Tyr Ala Pro Asn Met Asp Lys His Trp He Met Gin Tyr Thr 
250 255 260 

gga cca atg ctg cce ate cac atg gaa ttt aca aac att eta cag cgc 
Gly Pro Met Leu Pro He His Met Glu Phe Thr Asn He Leu Gin Arg 
265 270 275 280 

aaa agg etc cag act ttg atg tea gtg gat gat tct gtg gag agg ctg 
Lys Arg Leu Gin Thr Leu Met Ser Val Asp Asp Ser Val Glu Arq Leu 
285 290 295 

tat aac atg etc gtg gag acg ggg gag ctg gag aat act tac ate att 
Tyr Asn Met Leu Val Glu Thr Gly Glu Leu Glu Asn Thr Tyr He lie 
300 305 310 

tac acc gee gac cat ggt tac cat att ggg cag ttt gga ctg gtc aag 
Tyr Thr Ala Asp His Gly Tyr His He Gly Gin Phe Gly Leu Val Lys 
315 320 325 

ggg aaa tec atg cca tat gac ttt gat att cgt gtg cct ttt ttt att 
Gly Lys Ser Met Pro Tyr Asp Phe Asp He Arg Val Pro Phe Phe He 
330 335 340 
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cgt ggt cca agt gta gaa cca gga tea ata gtc tea cag ate gtt etc 
Arg Gly Pro Ser Val Glu Pro Gly Ser He Val Pro Gin He Val Leu 
345 3 50 355 360 

aac att gac ttg gec ccc acg ate ctg gat att get ggg etc gac aca 
Asn He asp Leu Ala Pro Thr He Leu Asp He Ala Gly Leu Asp Thr 
365 370 375 

cct cet gat gtg gac ggc aag tct gtc etc aaa ct-fc ctg gac cca gaa 
Pro Pro Asp Val Asp Gly Lys Ser Val Leu Lys Leu Leu Asp Pro Glu 
380 385 

aag cca ggt aac agg ttt cga aca aac aag aag gec aaa att tgg cgt 
Lys Pro Gly Asn Arg Phe Arg Thr Asn Lys Lys Ala Lys He Trp Arg 
395 400 405 

gat aca ttc eta gtg gaa aga gge aaa ttt eta cgt aag aag gaa gaa 
Asp Thr Phe Leu Val Glu Arg Gly Lys Phe Leu Arg Lys Lys Glu Glu 
410 415 420 

tec age aag aat ate caa cag tea aat cac ttg ccc aaa feat gaa egg 
Ser Ser Lys Asn He Gin Gin Ser Asn His Leu Pro Lys Tyr Glu Arg 
425 «0 435 440 

gtc aaa gaa eta tgc cag cag gee agg tac cag aca gee tgt gaa caa 
Val Lys Glu Leu Cys Gin Gin Ala Arg Tyr Gin Thr Ala Cys Glu Gin 
445 450 455 

o C S W St 9 ? Sg t99 ° aa tgC att gag gat aca tct <3<30 aag ctt cga 
Pro Gly Gin Lys Trp Gin Cys lie Glu Asp Thr Ser Gly Lys Leu Arg 
4 60 4s§: 470 

att cac aag tgt aaa gga ccc agt gac ctg etc aca gtc egg pag age 
lie His Lys Cys Lys Gly Pro Ser Asp Leu Leu Thr Val A?g Gin sir 
475 480 485 

acg egg aac etc tac get cgc ggc ttc cat gac aaa gac aaa gag tgc 
Thr ^g Asn Leu Tyr Ala Arg Gly Phe His Asp Lys Asp Lys Glu Cys 
490 495 500 

agt tgt agg gag tot ggt tac cgt gee age aga age caa aga aag agt 
Ser Cys Arg Glu Ser Gly Tyr Arg Ala Ser Arg Ser Gin Arg Lys Ser 
505 51 ° 515 520 

caa egg caa ttc ttg aga aac cag ggg act cca aag tac aag ccc aga 
Gin Arg Gin Phe Leu Arg Asn Gin Gly Thr Pro Lys Tyr Lys Pro Arg 
525 5 30 535 

til £?* ° gg Cag aca cgt tcc **» tt!C 3 aa ttt gaa ggt 
Phe Val His Thr Arg Gin Thr Arg Ser Leu Ser Val Glu Phe Glu liy 
540 545 550 

gaa ata tat gac ata aat ctg gaa gaa gaa gaa gaa ttg caa gtg ttg 
Glu He Tyr Asp He Asn Leu Glu Glu Glu Glu Glu Leu Gin Val Leu 
555 560 565 

caa cca aga aac att get aag cgt cat gat gaa ggc eac aag ggg cca 
Gin Pro Arg Asn He Ala Lys Arg His Asp Glu Gly His Lys Gly Pro 
570 575 580 

III III tt? nf 9 If o C ° 3gt 99t 930 3aC agg **> »W at 9 ct 3 9=a 
Arg Asp Leu Gin Ala Ser Ser Gly Gly Asn Arg Gly Arg Met Leu Ala 

585 590 595 600 

gat age age aac gec gtg ggc cca cct ace act gtc cga gtg aca cac 
Asp Ser Ser Asn Ala Val Gly Pro Pro Thr Thr Val Arg Val Thr His 
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aag tgt ttt att ctt ccc aat gac tct ate cat tgt gag aga gaa ctg 2405 
Lys Cys Phe He Leu Pro Asn Asp Ser He His Cya Glu Arg Glu Leu 
620 625 630 

tac caa teg gec aga gcg tgg aag gac cat aag gca tac att gac aaa 2453 
Tyr Gin Ser Ala Arg Ala Trp Lys Asp His Lys Ala Tyr He Asp Lys 
635 640 645 

gag att gaa get ctg caa gat aaa att aag aat tta aga gaa gtg aga 2501 
Glu He Glu Ala Leu Gin Asp Lys He Lys Asn Leu Arg Glu Val Arg 
650 655 660 

gga cat ctg aag aga agg aag cct gag gaa tgt age tgc agt aaa caa 2549 
Gly His Leu Lys Arg Arg Lys Pro Glu Glu Cys Ser Cys Ser Lys Gin 
€65 670 675 680 

age tat tac aat aaa gag aaa ggt gta aaa aag caa gag aaa tta aag 2597 
Ser Tyr Tyr Asn Lys Glu Lys Gly Val Lys Lys Gin Glu Lys Leu Lys 
685 690 695 

age cat ctt cac cca ttc aag gag get get cag gaa gta gat age aaa 2645 
Ser His Leu His Pro Phe Lys Glu Ala Ala Gin Glu Val Asp Ser Lys 
700 705 710 

ctg caa ctt ttc aag gag aac aac cgt agg agg aag aag gag agg aag 2693 
Leu Gin Leu Phe Lys Glu Asn Asn Arg Arg Arg Lys Lys Glu Arg Lys 
715 720 725 

gag aag aga egg cag agg aag ggg gaa gag tgc age ctg cct ggc etc 2741 
Glu Lys Arg Arg Gin Arg Lys Gly Glu Glu Cys Ser Leu Pro Gly Leu 
730 735 140. 

act tgc ttc acg cat gac aac aac cac tgg cag aca gec ccg ttc tgg 2789 
Thr Cys Phe Thr His Asp Asn Asn His Trp Gin Thr Ala Pro Phe Trp 
745 750 755 760 

aac ctg gga tct ttc tgt get tgc acg agt tct aac aat aac acc tac 2837 
Asn leu Gly Ser Phe Cys Ala Cys Thr Ser Ser Asn Asn Asn Thr Tyr 
765 770 775 

tgg tgt ttg cgt aca gtt aat gag acg cat aat ttt ctt ttc tgt gag 2885 
Trp Cys Leu Arg Thr Val Asn Glu Thr His Asn Phe Leu Phe Cys Glu 
780 785 790 

ttt get act ggc ttt ttg gag tat ttt gat atg aat aca gat cct tat 2933 
Phe Ala Thr Gly Phe Leu Glu Tyr Phe Asp Net Asn Thr Asp Pro Tyr 
795 800 80S 

cag etc aca aat aca gtg cac acg gta gaa cga ggc att ttg aat cag 2981 
Gin Leu Thr Asn Thr Val His Thr Val Glu Arg Gly He Leu Asn Gin 
810 815 820 

eta cac gta caa eta atg gag etc aga age tgt caa gga tat aag cag 3029 
Leu His Val Gin Leu Met Glu Leu Arg Ser Cys Gin Gly Tyr Lys Gin 
825 830 835 840 

tgc aac cca aga cct aag aat ctt gat gtt gga aat aaa gat gga gga 3077 
Cys Asn Pro Arg Pro Lys Asn Leu Asp Val Gly Asn Lys Asp Gly Gly 
845 850 855 

age tat gac eta cac aga gga cag tta tgg gat gga tgg gaa ggt taa 3125 
Ser Tyr Asp Leu His Arg Gly Gin Leu Trp Asp Gly Trp Glu Gly * 
860 865 870 

tcagccccgt ctcactgcag acatcaactg geaaggecta gaggagctac acagtgtgaa 3185 
tgaaaacatc tatgagtaca gacaaaacta cagacttagt ctggtggact ggactaatta 3245 
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cttgaaggat ttagatagag tatttgcact gctgaagagt cactatgagc aaaataaaac 3305 

aaataagact caaactgctc aaagtgacgg gttcttggtt gtctctgctg agcacgctgt 3365 

gtcaatggag atggcctctg ctgactcaga tgaagaccca aggcataagg ttgggaaaac 3425 

acctcatttg aecttgccag ctgaccttca aaccctgcat ttgaaccgac caacattaag 3485 

tccagagagt aaacttgaat ggaataacga cattccagaa gttaatcatt tgaattctga 3545 

acactggaga aaaacegaaa aatggacggg gcatgaagag actaatcato tggaaaccga 3605 

tttcagtggc gatggcatga cagagctaga gctcgggccc agccccaggc tgcagcccat 3665 

tcgcaggcac ccgaaagaac ttccccagta tggtggtcct ggaaaggaca tttttgaaga 372'5 

tcaactatat cttoctgtgc attccgatgg aatttcagtt catcagatgt tcaccatggc 3785 

caccgcagaa caccgaagta attccagcat agcggggaag atgttgacca aggtggagaa 3845 

gaatcacgaa aaggagaagt cacagcacct agaaggcagc gcctcctctt cactctcctc 3905 

tgattagatg aaactgttac cttaccctaa acacagtatt tctttttaac ttttttattt 3965 

gtaaactaat aaaggkaatc acagccacca acattccaag ctaccctggg tacctttgtg 4025 

cagtagaagc tagtgagcat gtgagcaagc ggtgtgcaca cggagactca tcgttataat 4085 

ttactatctg ccaaggagta gaaagaaagg ctggggatat ttgggttggc tttggktttg 4145 

attttttgct tggttggttg gtttgkacta aaacagtatt atcttttgaa tatcgtaggg 4205 

acataarkww wwwmmwkjctw wtctttawyrora kakgsywrra wkgggstyty tskkrkstatw 4265 

atnwykwscmc cyskkrwwaw tywywmmywc mykytssstg rykrnktaat gaagtt 4321 

<210> 5 

<211> 569 

<212> PRT 

<213> homo sapiens 



<400> 5 
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Gly 


Val 
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He 
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Leu 
100 
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Gly 


Arg 
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He 
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Gly 
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He 
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Gly 
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Thr 


Phe 
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Gly 
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Val 
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Ala 
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He 
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Val 
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Leu 
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{•Set 
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Val 
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Trp 


Ala 


Leu 
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Trp 
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Trp Glu 
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Gly 


Gly 


Val 
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Gly 
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Leu 


Gly 
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Val 


His 
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Ser 


Pro 


Leu 
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Lys Arg 
335 
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<2I0> 6 

<211> 2940 

<212> DS1& 

<2I3> homo sapiens 

<220> 
<221> CDS 

<222> (334) .,, (2043) 
<400> 6 

ccacgcgtcc gcccacgcgt ccggctgcca cgccgcgtct caggctggcc gggctgagcc 60 
ggggaagagg gagcaaaggc ggcgcagggc ctgcgcttag gcagcgggag gcagctcggc 120 
gcgggcctga octocccaga gcgccccgct gcggccgagc agatccggcc cagccgtccg 180 
goagccagtc ccggaccaga cactggaccg tccccggggg gcgctgaact ccctcgcagc 240 
atccgagccg gcgggccggt ggtgcgccct gggcgcgcga ggtggtgagg ccccaggagc 300 
ccggcgcgcc gggacacgcg ggccggcttg gcg atg cac acc etc act ggc ttc 354 
Met His Thr Leu Thr Gly Phe 
1 5 

tct ctg gtc age ctg etc age ttc ggc tac ctg tec tgg gac tgg gee 402 
Ser Leu Val Ser Leu Leu Ser Phe Gly Tyr Leu Ser Trp Asp Trp Ala 
10 15 20 

aag ccg age ttc gtg gec gac ggg ccc ggg gag get ggc gag cag ccc 450 
Lys Pro Ser Phe Val Ala Asp Gly Pro Gly Glu Ala Gly Glu Gin Pro 
25 30 35 

teg gee get ccg ccc cag cct ccc cac ate ate ttc ate etc acg gac 498 
Ser Ala Ala Pro Prp Gin Pro Pro His He He Phe He Leu Thr Asp 
40 45 50 55 

gac caa ggc tac cac gac gtg ggc tac cat ggt tea gat ate gag acc 546 
Asp Gin Gly Tyr His Asp Val Gly Tyr His Gly Ser Asp He Glu Thr 
60 65 70 

cct acg ctg gac agg ctg gcg gec aag ggg gtc aag ttg gag aat tat 594 
11 
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Pro Thr Leu Asp Arg Leu Ala Ala Lys Gly Val Lys Leu Glu Asn Tyr 
75 80 85 

tac ate cag ccc ate fcgc acg cct teg egg age cag etc etc act ggc 642 
Tyr lie Gin Pro lie Cys Tar Pro Ser Arg Ser Gin Leu Leu Thr Gly 
90 95 100 

agg tac cag ate cac aca gga etc cag' cat tec ate ate cgc cca cag 690 
Arg Tyr Gin He His Thr Gly Leu Gin His Ser He He Arg Pro Gin 
105 110 H5 

cag ccc aac tgc ctg ccc ctg gac cag gtg aca ctg cca cag aag ctg 738 
Gin Pro Asn Cys Leu Pro Leu Asp Gin Val Thr Leu Pro Gin Lys Leu 
120 125 130 135 

cag gag gca ggt tat tec ace cat atg gtg ggc aag tgg eac ctg ggc 786 
Gin Glu Ala Gly Tyr Ser Thr His Met Val Gly Lys Trp His Leu Gly 
140 145 150 

ttc tac egg aag gag tgt ctg ccc acc egt egg ggc ttc gac ace ttc 834 
Phe Tyr Arg Lys Glu Cys Leu Pro Thr Arg Arg Gly Phe Asp Thr Phe 
155 160 165 

ctg ggc teg etc acg ggc aat gtg gac tat tac ace tat gac aac tgt 882 
Leu Gly ser Leu Thr Gly Asn Val Asp Tyr Tyr Thr Tyr Asp Asn Cys 
170 175 180 

gat ggc cca ggc gtg tgc ggc ttc gac ctg cac gag ggt gag aat gtg 930 
Asp Gly Pro Gly Val Cys Gly Phe Asp Leu His Glu Gly Glu Asn Val 

185 190 ' m$. 

gee tgg ggg etc age ggc cag tac tec act atg ctt tac gec cag ego 978: 
Ala Trp Gly Leu Ser Gly Gin Tyr Ser Thr Met Leu Tyr Ala Gin Arg 
200 205 210 215 

gec age cat ate ctg gec age cac age cct cag egt ccc etc ttc etc 1G26 
Ala Ser His He Leu Ala Ser His Ser Pro Gin Arg Pro Leu Phe Leu 
220 225 230 

tat gtg gec ttc cag gca gta cac aca ccc ctg cag tec cct egt gag 1074 
Tyr Val Ala Phe Gin Ala Val His Thr Pro Leu Gin Ser Pro Arg Glu 
235 240 245 

tac ctg tac cgc tac cgc acc atg ggc aat gtg gee egg egg aag tac 1122 
Tyr Leu Tyr Arg Tyr Arg Thr Met Gly Asn Val Ala Arg Arg Lys Tyr 
250 -25$ 260 

gcg gee atg gtg acc tgc atg gat gag get gtg cgc aac ate acc tgg 1170 
Ala Ala Met Val Thr Cys Met Asp Glu Ala Val Arg Asn He Thr Trp 
265 270 275 

gec etc aag cgc tac ggt ttc tac aac aac agt gtc ate ate ttc tec 1218 
Ala Leu Lys Arg Tyr Gly Phe Tyr Asn Asn Ser Val He He Phe Ser 
280 285 290 295 

agt gac aat ggt ggc cag act ttc teg ggg ggc age aac tgg ccg etc 1266 
Ser Asp Asn Gly Gly Gin Thr Phe Ser Gly Gly Ser Asn Trp Pro Leu 
300 305 310 

cga gga cgc aag ggc act tat tgg gaa ggt ggc gtg egg ggc eta ggc 1314 
Arg Gly Arg Lys Gly Thr Tyr Trp Glu Gly Gly Val Arg Gly Leu Gly 
315 320 325 

ttt gtc cac agt ccc ctg etc aag cga aag caa egg aca age egg gca 1362 
Phe Val His Ser Pro Leu Leu Lys Arg Lys Gin Arg Thr Ser Arg Ala 
330 335 340 
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ctg atg cac ate act gac tgg tac ccg acc otg gtg ggt ctg gca ggt 1410 
Leu Met His lie Thr Asp Trp Tyr Pro Thr Leu Val Gly Leu Ala Gly 
345 350 355 

ggt acc acc tea gca gec gat ggg eta gat ggc tac gac gtg tgg ccg 1458 
Gly Thr Thr Ser Ala Ala Asp Gly Leu Asp Gly Tyr Asp Val Trp Pro 
360 365 370 375 

gee ate age gag ggc egg gec tea cca cgc acg gag ate ctg cac aac 1506 
Ala He Ser Glu Gly Arg Ala Ser Pro Arg Thr Glu He Leu His Asn 
380 385 390 

att gac cca etc tac aac cat gec cag cat ggc tec ctg gag ggc ggc 1554 
He Asp Pro Leu Tyr Asn His Ala Gin His Gly Ser Leu Glu Gly Gly 
395 400 405 

ttt ggc ate tgg aac acc gec gtg cag get gee ate cgc gtg ggt gag 1602 
Phe Gly He Trp Asn Thr Ala Val Gin Ala Ala He Arg Val Gly Glu 
410 415 420 

tgg aag ctg ctg aca gga gac ccc ggc tat ggc gat tgg ate cca ecg 1650 
Trp Lys Leu Leu Thr Gly Asp Pro Gly Tyr Gly Asp Trp He Pro Pro 
425 430 435 

cag aca ctg gec acc ttc ccg ggt age tgg tgg aac ctg gaa cga atg 1698 
Gin Thr Leu Ala Thr Phe Pro Gly Ser Trp Trp Asn Leu Glu Arg Met 
440 445 450 455 

gec agt gtc cgc cag gec gtg tgg etc ttc aac ate agt get gac cct 1746 
Ala Ser Val Arg Gin Ala Val Trp Leu Phe Asn He Ser Ala Asp Pro 
460 465 470 

tat gaa egg gag gac ctg get ggc cag egg cct gat gtg gtc cgc acc 1794 
Tyr Glu Arg Glu Asp Leu Ala Gly Gin Arg Pro Asp Val Val Arg Thr 
475 480 485 



ctg ctg get cgc ctg gec gaa tat aac cgc aca gec ate ccg gta cgc 1842 
Leu Leu Ala Arg Leu Ala Glu Tyr Asn Arg Thr Ala He Pro Val Arg 
490 495 500 

tac cca get gag aac ccc egg get cat cct gac ttt aat ggg ggt get 1890 
Tyr Pro Ala Glu Asn Pro Arg Ala His Pro Asp Phe Asn Gly Gly Ala 
505 510 515 

tgg ggg ccc tgg gee agt gat gag gaa gag gag gaa gag gaa ggg agg 1938 
Trp Gly Pro Trp Ala Ser Asp Glu Glu Glu Glu Glu Glu Glu Gly Arg 
520 525 530 535 

get cga age ttc tec egg ggt cgt cgc aag aaa aaa tgc aag att tgc 1986 
Ala Arg Ser Phe Ser Arg Gly Arg Arg Lys Lys Lys Cys Lys He Cys 
540 545 550 

aag ctt cga tec ttt ttc cgt aaa etc aac acc agg eta atg tec caa 2034 
Lys Leu Arg Ser Phe Phe Arg Lys Leu Asn Thr Arg Leu Met Ser Gin 
555 560 565 

egg ate tga tggtggggag ggagaaaact gtcctttaga ggatcttccc 2083 
Arg He * 



cactccggct tggccctgct gtttctcagg gagaagectg tcacatetcc atctacaggg 2143 

agttggaggg tgtagagtcc cttggttgaa cagggtaggg agectggata ggagtgggtg 2203 

ggaataaacc agactgggat gcctgtgtct cagtcctgcc tcctcacgga ettgetctgt 2263 

gacctcaggt gacccacatg agcttttagc ctcagtttcc tcatctgtaa aatgagctct 2323 

aatgactttg tgactctttg gtgtggccct ggagcctggg gccacggtgg agttcctggc 2383 

cggcettgcc acttgacaac tcctttaagg cttccccctt aacaegggat ccctgtggtg 2443 

gtgtttggga gttgcctgga ggcaactcca agcctggccc ccagctgaag catggcaatc 2503 
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tggctgctct ctacagggac coccaagcgc tgtgggtgga gggcaggggt cgggggggtt 2563 

gaccttcttg ggtcttcaca tggcctaggc cagtcctccg gtcagactgg tgtcaggcac 2623 

cgtggtgcaa aattcctctt ctggcccctc cagtacccag agaaactggc tgggccatta 2683 

actgctgcag caccaagggt ggtagaaaga gctgtgaaga gcccccaaac cagtaccagg 2743 

acacctgggt tctcctgtga cctggggcac agttcttgcc ctctaggcct tgatttcccc 2803 

acctgcaagt ggggatgcca gccctggctc tgcctccttc atgaggctct ggaagactgg 2863 

ccaaggttgt ggaggagctt gtgaacttga ttaaagtgtc gtaacatgga aaaaaaaaaa 2923 

aaaaaaaaaa agggcgg 2940 

<210> 7 
<211> 599 
<212> PRT 

<213> homo sapiens 
<400> 7 

Met Ala Pro Arg Gly Cys Ala Gly His Pro Pro Pro Pro Ser Pro Gin 

15 10 15 

Ala Cys Val Cys Pro Gly Lys Met Leu Ala Met Gly Ala Leu Ala Gly 

20 25 30 

Phe Trp He Leu Cys Leu Leu Thr Tyr Gly Tyr Leu Ser Trp Gly Gin 

35 AQ 45 

Ala Leu Glu Glu Glu Glu Glu Gly Ala Leu Leu Ala Gin Ala Gly Glu 

50 55 SO 

Lys Leu Glu Pro Ser Thr Thr Ser Thr Ser Gin Pro His Leu He Phe 
65 70 7S 80 

He Leu Ala Asp Asp Gin Gly Phe Arg Asp Val Gly Tyr His Gly Ser 

85 90 95 

Glu He Lys Thr Pro Thr Leu Asp Lys Leu Ala Ala Glu Gly Val Lys 

100 105 i0i 

Leu Glu Aan Tyr Tyr Val Gin Pro He Cys Thr Pro Ser Arg Ser Gin 

115 120 125 

Phe He Thr Gly Lys Tyr Gin lie- His Thr Gly Leu Gin His Ser He 

130 135 • ; 

He Arg Pro Thr Gin Pro Asn Cys Leu Pro Leu Asp Asn Ala Thr Leu 
145 150: 155 160 

Pro Gin Lys Leu Lys Glu Val Gly Tyr Ser Thr His Met Val Gly Lys 

165 170 175 

Trp His Leu Gly Phe Tyr Arg Lys Glu Cys Met Pro Thr Arg Arg Gly 

180 185 180 

Phe Asp Thr Phe Phe Gly Ser Leu Leu Gly Ser Gly Asp Tyr Tyr Thr 

195 200 205 

His Tyr Lys Cys Asp Ser Pro Gly Met Cys Gly Tyr Asp Leu Tyr Glu 

210 215 220 

Asn Asp Asn Ala Ala Trp Asp Tyr Asp Asn Gly He Tyr Ser Thr Gin 
225 230 235 240 

Met Tyr Thr Gin Arg Val Gin Gin He Leu Ala Ser His Asn Pro Thr 

245 250 255 

Lys Pro He Phe Leu Tyr He Ala Tyr Gin Ala Val His Ser Pro Leu 

260 265 270 

Gin Ala Pro Gly Arg Tyr Phe Glu His Tyr Arg Ser He He Asn He 

275 280 285 

Asn Arg Arg Arg Tyr Ala Ala Met Leu Ser Cys Leu Asp Glu Ala He 

290 295 300 

Asn Asn Val Thr Leu Ala Leu Lys Thr Tyr Gly Phe Tyr Asn Asn Ser 
305 310 315 320 

He lie He Tyr Ser Ser Asp Asn Gly Gly Gin Pro Thr Ala Gly Gly 

325 330 335 

Ser Asn Trp Pro Leu Arg Gly Ser Lys Gly Thr Tyr Trp Glu Gly Gly 

340 345 350 

He Arg Ala Val Gly Phe Val His Ser Pro Leu Leu Lys Asn Lys Gly 

355 360 365 

Thr Val Cys Lys Glu Leu Val His Ik .Thr Asp Trp Tyr Pro Thr Leu 

370 375 3B0 

He Ser Leu Ala Glu Gly Gin He Asp Glu Asp He Gin Leu Asp Gly 
385 390 395 400 

Tyr Asp He Trp Glu Thr He Ser Glu Gly Leu Arg Ser Pro Arg Val 
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Asp He Leu His Asn He Asp Pro He Tyr Thr Lys Ala Lys Asn GXy 

420 425 430 

Ser Trp Ala Ala Gly Tyr Sly n e Trp Asn Thr Ala He Gin Ser Ala 

435 440 445 

He Arg Val Gin His Trp Lys Leu Leu Thr Gly Asn Pro Gly Tyr Ser 

430 455 460 

Asp Trp Val Pro Pro Gin Ser Phe Ser Asn Leu Gly Pro Asn Arg Trp 
470 47S 480 

His Asn Glu Arg He Thr Leu Ser Thr Gly Lys Ser Val Trp Leu Phe 

485 490 495 

Asn He Thr Ala Asp Pro Tyr Glu Arg Val Asp Leu Ser Asn Arg Tyr 

500 505 510 

Pro Gly lie Val Lys Lys Leu Leu Arg Arg Leu Ser Gin Phe Asn Lys 

S15 520 525 

Thr Ala Val Pro Val Arg Tyr Pro Pro Lys Asp Pro Arg Ser Asn Pro 

SJU 535 540 

Arg Leu Asn Gly Gly Val Trp Gly Pro Trp Tyr Lys Glu Glu Thr Lys 
7 550 555 560 

Lys Lys lys Pro Ser Lys Asn Gin Ala Glu Lys Lys Gin Lys Lys Ser 

565 570 575 

Lys Lys Lys Lys Lys Lys Gin Gin Lys ala Val Ser Gly Ser Thr Cys 

580 585 son 

His Ser Gly Val Thr Cys Gly 
595 

<210> 8 

<2H> 2253 

<212> DNA 

<213> homo sapiens 

<220> 
<221> CDS 

<222> (324) . , . (2123) 
<40O> 8 

SI£ CC H c ? cacgc3tc c 9tgg a gata ttaacttttt tctttttttt tttccttggt 60 
ggaagctgct ctagggaggg gggaggagga ggagaaagtg aaatgtgctg gagaagagcg 120 
ITtTaalttt lttt ttCCg t agtcccatcc attaagccat = a cttc?gga ! ff 2 tt£K 100 
lit 9 l g t P ga ^gag WWatttet tgecaggtgg agagtottca 240 
artetcacc? S^f 9 ^ CgC f goa ^ggcgcggg gcgcgtggtt ctccgcgtgg 30b 
agtctcacct gggacctgag tga atg get ccc agg ggc tgt gcg ggg cat ccg 353 
Met Ala Pro Arg Gly Cys Ala Gly His Br 0 : 
1 5 10 

cct ccg cct tct eca cag gec tgt gtc tgt cot gga aag atg eta oca 401 
Pro Pro Pro Ser Pro Gin Ala Cys Val Cys Pro fly Lys Me? llu 111 
15 20 25 

atg ggg gcg ctg gca gga ttc tgg ate etc tgc etc etc act tat not 44 o 

Met Gly Ala Leu Ala Gly Phe Trp He Leu Cys Leu Leu Thr i£ lly 
30 35 40 

tac ctg tec tgg ggc cag gee tta gaa gag gag gaa gaa ggg gec tta 497 
Tyr Leu Ser Trp Gly Gin Ala Leu Glu Glu Glu llu Ilu lly 111 £eu 
45 50 55 

eta get caa get gga gag aaa eta gag ccc age aca act tec arc ten *A« 
Leu Ala Gin Ala Gly Glu Lys Leu Glu Pro sir Thr ?hr sir ?hr Ser 
60 65 70 

.cag ccc cat etc att ttc ate eta geg gat gat cag gga ttt aga gat 593 
Gin Pro Hxs Leu He Phe He Leu Ala Asp Asp Gin lly Phe llg j£ " 
85 go 

£? rf t a ° S a ° g ? 3 tCt 989 att aaa aca cct a °t ctt gac aag etc 641 
Val Gly Tyr His Gly Ser Glu He Lys Thr Pro Thr Leu Asp Lys Leu 
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get gec gaa gga gtt aaa ctg gag aac tac tat gtc cag cct att tgc 

Ala Ala Glu Gly Val Lys Leu Glu Asn Tyr Tyr Val Gin Pro lie Cys 
110 115 120 

aca cca tec agg agt cag ttt att act gga aag tat cag ata cac acc 

Thr Pro Ser Arg Ser Gin Phe He Thr Gly Lys Tyr Gin He His Thr 
125 130 135 

gga ctt caa cat tct ate ata aga cct acc caa ccc aac tgt tta cct 

Gly leu Gin His Ser He He Arg Pro Thr Gin Pro Asn Cys Leu Pro 
140 145 150 

ctg gac aat gec acc eta cct cag aaa ctg aag gag gtt gga tat tea 

Leu. Asp Asn Ala Thr Leu Pro Gin Lys Leu Lys Glu Val Gly Tyr Ser 

..155 160 165 170 

acg cat atg gtc gga aaa tgg cac ttg ggt ttt tac aga aaa gaa tgc 

Thr His Met Val Gly Lys Trp His Leu Gly Phe Tyr Arg Lys Glu Cys 



atg ccc acc aga aga gga ttt gat acc ttt ttt ggt tec ctt ttg gga 
Met Pro Thr Arg Arg Gly Phe Asp Thr Phe Phe Gly Ser Leu Leu Gly 
130 195 200 

agt ggg gat tac tat aca cac tac aaa tgt gac agt cct ggg atg tgt 
Ser Gly Asp Tyr Tyr Thr His Tyr Lys Cys Asp Ser Pro Gly Met Cys 
205 210 215 

ggc tat gac ttg tat gaa aac gac aat get gec tgg gac tat gac aat 
Gly Tyr Asp Leu Tyr Glu Asn Asp Asn Ala Ala Trp Asp Tyr Asp Asn 
220 225 230 

ggc ata tac tec aca cag atg tac act cag aga gta cag caa ate tta 
Gly He Tyr Ser Thr Gin Met Tyr Thr Gin Arg Val Gin Gin He Leu 
235 240 245 250 

get tec cat aac ccc aca aag cct ata ttt tta tat att gec tat caa 
Ala Ser His Asn Pro Thr Lys Pro He Phe Leu Tyr He Ala Tyr Gin 
* 255 260 265 

get gtt cat tea cca ctg caa get cct ggc agg tat ttc gaa cac tac 
Ala Val His Ser Pro Leu Gin Ala Pro Gly Arg Tyr Phe Glu His Tyr 
270 275 280 

cga tec att ate aac ata aac agg agg aga tat get gec atg ctt tec 
Arg Ser He He Asn He Asn Arg Arg Arg Tyr Ala Ala Met Leu Ser 
285 290 295 

tgc tta gat gaa gca ate aac aac gtg aca ttg get eta aag act tat 
Cys Leu Asp Glu Ala He Asn Asn Val Thr Leu Ala Leu Lys Thr Tyr 
300 305 310 

ggt ttc tat aac aac age att ate att tac tct tea gat aat ggt ggc 
Gly Phe Tyr Asn Asn Ser He He He Tyr Ser Ser Asp Asn Gly Gly 
315 320 325 330 

cag cct acg gca gga ggg agt aac tgg cet etc aga ggt age aaa gga 
Gin Pro Thr Ala Gly Gly Ser Asn Trp Pro Leu Arg Gly Ser Lys Gly 



aca tat tgg gaa gga ggg ate egg get gta ggc ttt gtg cat age cca 
Thr Tyr Trp Glu Gly Gly He Arg Ala Val Gly Phe Val His Ser Pro 
350 355 360 

ctt ctg aaa aac aag gga aca gtg tgt aag gaa ctt gtg cac ate act 
Leu Leu Lys Asn Lys Gly Thr Val Cys Lys Glu Leu Val His He Thr 
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gac tgg tac ccc act etc att tea ctg get gaa gga cag att gat gag 
Asp Trp Tyr Pro Thr Leu lie Ser Leu Ala GIu Glv Gin lie Asp Glu 
380 385 390 

gac att caa eta gat ggc tat gat ate tgg gag acc ata agt gag ggt 
Asp lie Gin Leu Asp Gly Tyr Asp lie Trp Glu Thr lie Ser Glu Gly 
395 400 405 410 

ctt cgc tea ccc cga gta gat att ttg cat aac att gac ccc ata tac 
Leu Arg Ser Pro Arg Val Asp He Leu His Asn He Asp Pro He Tyr 
415 420 425 

acc aag gca aaa aat ggc tec tgg gca gca ggc tat ggg ate tgg aac 
Thr Lys Ala Lys Asn Gly Ser Trp Ala Ala Gly Tyr Gly He Trp Asn 
430 435 440 

act gca ate cag tea gee ate aga gtg cag cac tgg aaa ttg ctt aca 
Thr Ala He Gin Ser Ala He Arg Val Gin His Trp Lys Leu Leu. Thr 
445 450 455 



aaa agt gta tgg ctt ttc aac ate aca gec gac cca tat gag agg gtg 1841 
Lys Ser Val Trp Leu Phe Asn He Thr Ala Asp Pro Tyr Glu Arg Val 
495 500 505 

gac eta tct aac agg tat cca gga ate gtg aag aag etc eta egg agg 1889 
Asp Leu Ser Asn Arg Tyr Pro Gly lie Val Lys Lys Leu Leu Arg Arg 
510 515 520 

etc tea cag ttc aac aaa act gca gtg ccg gtc agg tat cec ccc aaa 1937 
Leu Ser Gin Phe Asn Lys Thr Ala Val Pro Val Arg Tyr Pro Pro Lys 
525 530 535 

gac ccc aga agt aac cct agg etc aat gga ggg gtc tgg gga cca tgg 1985 
Asp Pro Arg Ser Asn Pro Arg Leu Asn Gly Gly Val Trp Gly Pro Trp 
540 545 550 

tat aaa gag gaa acc aag aaa aag aag cca age aaa aat cag get gag 2033 
Tyr Lys Glu Glu Thr Lys Lys Lys Lys Pro Ser Lys Asn Gin Ala Glu 
555 560 565 570 

aaa aag caa aag ,aaa age aaa aaa aag aag aag aaa cag cag aaa gca 2081 
Lys Lys Gin Lys Lys Ser Lys Lys Lys Lys Lys Lys Gin Gin Lys Ala 
575 580 585 

gtc tea ggt tea act tgc cat tea ggt gtt act tgt gga taa 2123 
Val Ser Gly Ser Thr Cys His Ser Gly Val Thr Cys Gly * 
530 595 

gcacaaatat ttcctgtttg gttaaacttt aatcagttct tatctttcat ctgtttccta 2183 
ggtaaaecag caaatttgge tcgataatat cgctggccta agegtcagge ttgttttcat 2243 
gctgtgccac 2253 

<210> 9 
<211> 552 
<212> PRT 
<213> Artificial 
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<220> 

<223> Pfam consensus sequence for human sulfatase 
«S00> 9 

Pro Asn lie Leu Leu lie Leu Ala Asp Asp Leu Gly lie Gly Asp Leu 

1 5 10 15 

Gly Cys Tyr Gly Asn Pro Thr He Arg Thr Pro Asn He Asp Arg Leu 

20 25 30 

Ala Glu Glu Gly Leu Arg Phe Thr Asn Ala Tyr Val Thr Thr Pro Leu 

35 40 45 

Cys Thr Pro Ser Arg Ala Ala Leu Leu Thr Gly Arg Tyr Pro His Arg 

50 55 60 

Thr Gly Met Tyr Thr Asn Asn Arg Ala Gly Val Leu Pro Phe Thr Gly 
65 70 75 80 

Trp Ser Leu Glu Gly Gly Leu Pro Leu Asp Glu Thr Thr Leu Pro Glu 

85 90 95 

Leu leu Lys Glu Ala Gly Tyr Ala Thr Gly Met Val Gly Lys Trp His 

100 105 110 

Gly Tyr Asn Glu Glu Ser Ser Ala Ser Asp Phe Ala His Leu Pro Leu 

115 120 125 

Gly Arg Gly Phe Asp Tyr Phe Tyr Gly Asn Leu Gly Gly Glu Asp Gin 

130 135 140 

Trp Tyr Pro Leu Val Asp Ala Leu Leu Pro Phe Thr Asn Asd Thr Tyr 
145 150 155 160 

Thr Gys Glu Gly Gly Tyr Gly Phe Ser Lys Asp Val Ala Leu Lys Pro 

165 i?0 175 

Leu Gly Ala Leu Gly Val Asn. Glu Val Glu Ala Pro Asp Lys Ala Leu 

180 lfiS 190 

Ala Asp Tyr Lys Thr Ala Gly Ala Leu Asn Val Pro His His Val Phe 

195 lOO 205 

Glu Trp Ala Asp Arg Tyr Ala Gly Ala Val Asp Val Gly Arg Pro Phe 

210 . 2X5 220 

Leu Ala Val Leu He Phe Pro Arg Pro Ala Ala Cys Phe leu Tyr Pro 
225 230 ' ' 235 " 240 

Asn Ala Thr Val Val Ser Gin Pro Met Pro His Ser Pro Leu Thr Ala 

245 250 255: 

Pro Arg Pro Trp Gin Leu Leu Ala Asp Glu Ala Leu Pro Phe Leu Glu 

260 263 270 

Arg Asn Gly Gin Arg Asp Lys Pro Phe Phe Leu Tyr Leu Ser Tyr Lys 

275 280. 285 

His Val His He Pro Arg Asp Ala Pro Met Leu Phe Ser Ser Lys Asp 

290 295 300 

Phe Ala Gly Ser Ser Arg Arg Gly Leu Tyr Gly Leu He Leu Asp Ser 
305 310 315 320 

Val Glu Glu Met Asp Asp Gly Val Gly Arg Val Leu Asn Ala leu Asp 

325 330 335 

Glu Leu, Asn Gly Leu Leu Asp Asn Thr leu He He Phe Thr Ser leu 

340 345 350 

Leu Asp His Gly Gly His Leu Gly Ala His Gly His Leu Gly He Arg 

355 360 365 

Ala Gly Gly Ser Asn Gly Pro Phe Arg Gly Gly Lys Gly Thr Asn Leu 

370 375 380 

Tyr Glu Gly Gly Thr Arg Val Pro Leu lie Val Arg Trp Pro Glu Gly 
385 390 395 400 

He He Ala Pro Gly Gin Val Ser Asp Glu Leu Val Ser Leu Met Asp 

405 410 415 

Leu Phe Pro Thr He Leu Asp Leu Ala Gly Ala Pro Leu Pro Gly Val 

420 425 430 

Ala Ala Gly Val lys Asp Arg lie leu Asp Gly Val Ser Leu Leu Pro 

435 440 445 

Leu leu leu Gly Ala Ala Gly Ser Ser Arg His Glu Thr Leu Phe Tyr 

450 455 460 

Glu Ser Tyr Cys Asn Glu Gly Arg Gly Phe leu Pro Ala Val Arg Trp 
465 470 475 480 

Gly Lys Lys lys Ala His Phe Arg Thr Pro Asn He Ala Gly Trp Gin 

485 490 495 

Arg Val Asp Phe Asp Asp Val Trp lys Leu Phe Asn Thr Val Glu Asp 
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500 505 510 

Phe Asn Arg Ser Gly Asp Asp Ala Gys Arg His Gly Asp Val Cys Lys 

515 520 525 

Cys Leu Gly Lys Pro Arg Arg Ser Val Thr His His Asp Pro Pro Leu 

530 535 540 

Leu Tyr Asp Leu Ser Arg Asp 2ro 
545 550 

<210> 10 
<211> 520 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Pfam consensus sequence for human sulfatase 
<400> 10 

Pro Asn Val Leu Leu lie Leu Ala Asp Asp Leu Gly lie Gly Asp Leu 

15 10 15 

Gly Cys Tyr Gly Kis Pro Thr He Arg Thr Pro Asn Leu Asp Arg Leu 

20 25 30 

Ala Glu Glu Gly Leu Arg Phe Thr Asn His Tyr Thr Ala Thr Pro Leu 

35 40 45 

Cys Ser Pro Ser Arg Ala Ala Leu Leu Thr Gly Arg Tyr Pro His Arg 

50 55 so 

His Gly Met Val Ser Asn Gly Arg Leu Gly Val Leu Gly Phe Thr Ala 
« 70 75 80 

Lys Ser Gly Gly Leu Pro Leu Asp Glu Thr Thr Leu Pro Glu Leu Leu 

85 90 95 

Lys Glu Ala Gly Tyr Ala Thr Gly Leu Val Gly Lys Trp His Leu Gly 

100 ; 105 110 

Leu Asn Glu Asn Ser Asp Ala .Ala Gly Asp Gly Glu His Leu Pro Leu 

lis im *25 •■' 

Gly Trp Arg Gly Phe Asp Tyr Phe Asp Gly Phe Leu Tyr Gly Ser Pro 

130 135 140 

She Thr Tyr Asp Glu Glu Asn Cys Asp Asn Gly Glu Gly Thr Glu Pro 
^5 150 155 166 

Pro Glu Ala Tyr Pro Glu Gin Gly Trp Leu Pro Gin He Leu Gly Tyr 

165 170 175 

Tyr Leu Thr Asp Leu Leu Ala Asp Lys Ala Leu Gly Leu Leu Asp Val 

180 185 190 

Ala Ser Ala Ala Gly Arg Leu Leu Ala Lys Ala Leu Ala Ala Ser Arg 

195 200 205 

Pro Phe Phe Leu Tyr He Ser Pro Pro Ala Pro His Phe Ser He Leu 

210 215 220 

Phe Arg Asn Phe Lys Glu Val Ala Gin Pro Tyr Arg Ala Pro Gin Leu 
225 230 235 240 

Thr Gin Leu Phe Val Asp Glu Ala Ala Asp Phe He Glu Arg Asn Lys 

245 250 255 

Glu Lys Pro Phe Phe Leu Tyr Leu Ala Phe Leu Arg Leu His Val His 

260 265 270 

Thr Pro Leu Phe Ser Pro Ala Glu Asp Leu Glu Ser Lys Asp Phe Leu 

275 280 285 

Gly Arg Ser Gin Arg Gly Arg Tyr Gly Asp Leu Val Glu Glu Met Asp 

290 295 300 

Asp Leu Val Gly Arg Val Leu Asp Ala Leu Glu Asp Leu Gly Leu Leu 
305 310 315 320 

Asp Asn Thr Leu Val He Phe Thr Ser Asp Asn Gly Ala His Leu Glu 

325 330 335 

Gly Thr Pro Glu Trp Tyr Gly Gly Gly Asn Gly Pro Leu Lys Gly Gly 



Val Arg Trp Pro Gly Gly He Ala Pro Ala Gly Arg Val Lys Glu Lys 

Ala P 

3' 
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Ala Gly Ala Pro Leu Pro Lys Val Ala Asn Gly Ala Lys Asp Arg Pro 

405 410 415 

Leu Asp Gly Val Ser Lsu Leu Pro Leu Leu Leu Gly Gly Ala Ala Pro 

420 425 430 

Ser Arg Arg Ala His Glu Thr Leu Phe His Tyr Asn Gly Lys Gly Arg 

435 440 445 

Lys Leu Arg Ala Val Arg Trp Pro Arg Lya Ser Gly Lys Thr Pro Lys 

450 455 460 

Leu Lys Ala His Phe Phe Thr Pro Ala Phe Asp Asp Asp Thr Asn. Asn 
465 470 475 480 

Gly Trp Glu Cys Val Gly Thr Val Ser Gin Ala Asp Asp lie Glu Asp 

485 490 495 

Cys Arg Cys Glu Gly Val Glu Thr Val Thr His His Asp Pro Pro Glu 

500 505 510 

Leu Tyr Asp Leu Ser Arg Asp Pro 
515 520 

<210> 11 

<211> 1578 

<212> DNA 

<213> homo sapiens 

<400> 11 

atgggctggc tttttctaaa ggttttgttg gcgggagtga gtttctcagg atttctttat 60 

cctcttgtgg atttttgcat cagtgggaaa acaagaggac agaagccaaa ctttgtgatt 120 

attttggcog atgacatggg gtggggtgac ctgggagcaa actgggcaga aacaaaggac 180 

actgccaacc ttgataagat ggcttcggag ggaatgaggt ttgtggattt ccatgcagct 240 

gcctccacct gctcaccctc ccgggcttcc ttgctcaccg gccggcttgg ccttcgcaat 300 

ggagtcacac gcaactttgc agtcacttct gtgggaggcc ttccgctcaa cgagaccaec 360 

ttggcagagg tgctgcagca ggcgggttac gtcactggga taataggcaa atggcatctt 420 

ggacaccacg gctcttatca ccccaacttc cgtggttttg attactactt tggaatccca 480 

tatagccatg atatgggctg tactgatact ccaggctaca accaccctcc ttgtccagcg 540 

tgtccacagg gtgatggaco atcaaggaae cttcaaagag actgttacac tgacgtggec 600 

dtccctcttt atgaaaacct caacattgtg gagcagccgg tgaacttgag cagccttgcc 660 

cagaagtatg ctgagaaagc aacccagttc atceagcgtg caagcaccag cgggaggcec 720 

ttectgotct atgtggctct ggccoacatg cacgtgccot tacccgtgac tcagctacca 780 

gcagcgccac ggggcagaag cctgtatggt gqagggctct gggagatgga cagtctggtg 840 

ggccagatca aggacaaagt tgaccacaca gtgaaggaaa acacattcct ctggtttaca 900 

ggagacaatg gcccgtgggc tcagaagtgt gagctagcgg gcagtgtggg tcccttcact 960 

ggattttggc aaactcgtca agggggaagt ccagccaagc agacgacctg ggaaggaggg 1020 

caccgggtcc cagcactggc ttactggcct ggcagagttc cagttaatgt caccagcact 1080 

gccttgttaa gcgtgctgga oatttttcca actgtggtag ccctggocca ggccagctta 1140 

cctcaaggac ggcgctttga tggtgtggac gtctccgagg tgctctttgg ccggtcacag 1200 

cctgggcaca gggtgctgtt ccaccccaac agcggggcag ctggagagtt tggagccctg 1260 

cagactgtcc gcctggagcg ttacaaggcc ttctacatta ccggtggagc cagggcgtgt 1320 

gatgggagca cggggcctga gctgcagcat aagtttcctc tgattttcaa cctggaagac 1380 

gataocgcag aagctgtgcc cctagaaaga ggtggtgcgg agtaccaggc tgtgctgccc 1440 

gaggtcagaa aggttcttgc agacgtcctc caagacattg ocaacgacaa catctccagc 1500 

geagattaca ctcaggaccc ttcagtaact ccctgctgta atccctacca aattgcctgc 1560 

cgctgtcaag ccgcataa 1578 

<210> 12 

<211> 2616 

<212> DNA 

<213> homo sapiens 

<400> 12 

atgaagtatt cttgctgtgc tctggttttg gctgtcctgg gcacagaatt gctgggaagc 60 

ctctgttcga ctgtcagatc cccgaggttc agaggacgga tacagcagga acgaaaaaac 120 

atccgaccca acattattct tgtgcttacc gatgatcaag atgtggagct ggggtccctg 180 

caagtcatga acaaaacgag aaagattatg gaacatgggg gggccacctt catcaatgcc 240 

tttgtgacta cacccatgtg ctgcccgtca cggtcctcca tgctcaccgg gaagtatgtg 300 

cacaatcaoa atgtctacac caacaacgag aactgctctt ccccctcgtg gcaggccatg 360 

catgagcctc ggacttttgc tgtatatctt aacaacactg gctacagaac agcctttttt 420 

ggaaaatacc tcaatgaata taacggcagc tacatccccc ctgggtggcg agaatggctt 480 

ggattaatca agaattctcg cttctataat tacactgttt gtcgcaatgg catcaaagaa 540 

aagoatggat ttgattatgc aaaggactac ttcacagact taatcactaa cgagagcatt 600 
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aattacttca aaatgtctaa gagaatgtat ccccataggc ccgttatgat ggtgatcagc 660 

cacgctgcgc cccacggccc cgaggactca gccccacagt tttctaaact gtaccccaat 720 

gcttcccaac acataactcc tagttataac tatgcaccaa atatggataa acactggatt 780 

atgcagtaca caggaccaat gctgcccatc cacatggaat ttacaaacat tctacagcgc 840 

aaaaggctcc agactttgat gtcagtggat gattctgtgg agaggctgta taacatgctc 900 

gtggagacgg gggagctgga gaatacttac atcatttaca ccgccgacca tggttaccat 960 

attgggcagt ttggactggt caaggggaaa tccatgccat atgactttga tattcgtgtg 1020 

ccttttttta ttcgtggtcc aagtgtagaa ccaggatcaa tagtcccaca gatcgttctc 1080 

aacattgact tggoccccac gatcctggat attgctgggc tcgacacacc tcctgatgtg 1140 

gacggcaagt ctgtcctcaa acttctggac ccagaaaagc caggtaacag gtttcgaaca 1200 

aacaagaagg ccaaaatttg gcgtgataca ttcctagtgg aaagaggcaa atttctacgt 1260 

aagaaggaag aatccagcaa gaatatccaa cagtcaaatc acttgcccaa atatgaacgg 1320 

gtcaaagaac tatgccagca ggccaggtac cagacagcct gtgaacaacc ggggcagaag 1380 

tggcaatgca ttgaggatac atctggcaag cttcgaattc acaagtgtaa aggacccagt 1440 

gacctgctca cagtccggca gagcacgcgg aacctctacg ctcgcggctt ccatgacaaa 1500 

gacaaagagt gcagttgtag ggagtctggt taccgtgcca gcagaagcca aagaaagagt. 1560 

oaacggcaat tcttgagaaa ccaggggact ccaaagtaca agcccagatt tgtccatact 1620 

cggcagacac gttccttgtc cgtcgaattt gaaggtgaaa tatatgacat aaatctggaa 1680 

gaagaagaag aattgcaagt gttgcaacca agaaacattg ctaagcgtca tgatgaaggc 1740 

cacaaggggc caagagatct ccaggcttcc agtggtggca acaggggcag gatgctggca 1800 

gatagcagca acgccgtggg cccacctacc aotgtccgag tgacacacaa gtgttttatt 1860 

cttcccaatg actctatcca ttgtgagaga gaactgtacc aatcggccag agcgtggaag 1920 

gaocataagg catacattga caaagagatt gaagctctgc aagataaaat taagaattta 1980 

agagaagtga gaggacatct gaagagaagg aagcctgagg aatgtagctg cagtaaacaa 2040 

agctattaca ataaagagaa aggtgtaaaa aagcaagaga aattaaagag ccatcttcac 2100 

ccattcaagg aggctgctca ggaagtagat agcaaactgc aacttttcaa ggagaacaac 2160 

cgtaggagga agaaggagag gaaggagaag agacggcaga ggaaggggga agagtgcagc 2220 

ctgcctggcc tcacttgctt cacgcatgac aacaaccact ggoagacagc cccgttctgg 2280 

aacctgggat ctttctgtgc ttgcacgagt tctaacaata acacctactg gtgtttgcgt 2340 

acagttaatg agacgcataa ttttcttttc tgtgagtttg ctactggctt tttggagtat 2400 

tttgatatga atacagatcc ttatcagctc acaaatacag tgcacacggt agaacgaggc 2460 

attttgaatc agctacacgt acaactaatg gagctcagaa gctgtoaagg atataagcag 2520 

tgcaacccaa gacctaagaa tcttgatgtt ggaaataaag atggaggaag ctatgacota 2580 

cacagaggac agttatggga tggatgggaa ggttaa 2616 

<210> 13 

<2ll> 1710 

<212> DNA 

<213> homo sapiens 

<400> 13 

atgcacaccc tcactggctt ctctctggtc agcctgctca gcttcggcta cctgtcctgg 60 

gactgggcca agccgagctt cgtggccgac gggcccgggg aggctggcga gcagccctcg 120 

gccgctccgc cccagcctcc ccacatcatc ttcatcctca cggacgacca aggctaccac 180 

gacgtgggct accatggttc agatatcgag acccctacgc tggacaggct ggcggccaag 240 

ggggtcaagt tggagaatta ttacatccag cccatctgca cgccttcgcg gagccagctc 300 

ctcactggca ggtaccagat ccacacagga ctccagcatt ccatcatccg cccacagcag 360 

cccaactgcc tgcccctgga ccaggtgaca ctgccacaga agctgcagga ggcaggttat 420 

tccacccata tggtgggcaa gtggcacctg ggcttctacc ggaaggagtg tctgcccacc 480 

cgtcggggct tcgacacctt cctgggctcg ctcacgggca atgtggacta ttacacctat 540 

gacaactgtg atggcccagg cgtgtgcggc ttcgacctgo acgagggtga gaatgtggcc 600 

tgggggctca gcggccagta ctccactatg ctttacgccc agcgcgccag ccatatcctg 660 

gccagccaca gccctcagcg tcccctcttc ctctatgtgg ccttccaggc agtacacaca 720 

cccctgcagt occctcgtga gtacctgtac cgctaccgca ccatgggcaa tgtggcccgg 780 

cggaagtacg cggccatggt gacctgcatg gatgaggctg tgcgcaacat cacctgggcc 840 

ctcaagcgot acggtttota caacaacagt gtcatcatct tctccagtga caatggtggc 900 

cagactttct cggggggcag caactggccg ctccgaggac gcaagggcac ttattgggaa 960 

ggtggcgtgc ggggcctagg ctttgtccac agtcccctgc tcaagcgaaa gcaacggaca 1020 

agccgggcac tgatgcacat cactgactgg tacccgaccc tggtgggtct ggcaggtggt 1080 

accacctcag cagccgatgg gctagatggo tacgacgtgt ggccggccat cagcgagggc 1140 

cgggcctcac cacgcacgga gatcctgcac aacattgacc cactctacaa ccatgcccag 1200 

catggctccc tggagggcgg ctttggcatc tggaacaccg ccgtgcaggc tgccatccgc 1260 

gtgggtgagt ggaagctgct gacaggagac cccggctatg gcgattggat cccaccgcag 1320 

acactggcca ccttcccggg tagctggtgg aacctggaac gaatggccag tgtccgccag 1380 

gccgtgtggc tcttcaacat cagtgctgac ccttatgaac gggaggacct ggctggccag 14 40 

cggcctgatg tggtccgcac cctgctggct cgcctggccg aatataaccg cacagccatc 1500 

ccggtacgct acccagctga gaacccccgg gctcatcctg actttaatgg gggtgcttgg 1560 
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gggccctggg ccagtgatga ggaagaggag gaagaggaag ggagggctcg aagcttctcc 1620 

cggggtcgtc gcaagaaaaa atgcaagatt tgcaagcttc gatccttttt cogtaaactc 1680 

aacaccaggc taatgtccca acggatctga 1710 

<210> 14 

<211> 1800 

<21.2> DNA 

<213> homo sapiens 

<400> 14 

atggctccca ggggctgtgc ggggcatccg cctccgcctt ctccacaggc ctgtgtctgt 60 

cctggaaaga tgctagcaat gggggcgctg gcaggattct ggatcctctg cctcctcact 120 

tatggttacc tgtcctgggg ccaggcctta gaagaggagg aagaaggggc cttactagct 180 

caagctggag agaaactaga gcccagcaca acttccacct cccagcccca tctcattttc 240 

atcctagcgg atgatcaggg atttagagat gtgggttacc acggatctga gattaaaaca 300 

cctactcttg acaagctcgc tgccgaagga gttaaactgg agaactacta tgtccagcct 360 

atttgcacac catccaggag tcagtttatt actggaaagt atcagataca caccggactt 420 

caacattcta tcataagacc tacccaaccc aactgtttae ctctggacaa tgccacccta 480 

cctcagaaac tgaaggaggt tggatattca acgcatatgg tcggaaaatg gcacttgggt 540 

ttttacagaa aagaatgcat gcccaccaga agaggatttg ataccttttt tggttccctt 600 

ttgggaagtg gggattacta tacacactac aaatgtgaca gtcctgggat gtgtggctat 660 

gacttgtatg aaaacgacaa tgetgcctgg gactatgaca atggcatata ctccacaeag 720 

atgtacactc agagagtaca gcaaatctta gcttcccata accccacaaa gcctatattt 780 

ttatatattg cctatcaagc tgttcattca ccactgcaag otcctggcag gtatttcgaa 840 

cactaccgat ccattatcaa cataaacagg aggagatatg ctgccatgct ttcctgctta 900 

gatgaagcaa tcaacaacgt gacattggct ctaaagactt atggtttcta taacaacagc 960 

attatcattt actcttcaga taatggtggc cagcctacgg caggagggag taactggcct 1020 

ctcagaggta gcaaaggaac atattgggaa ggagggatcc gggctgtagg ctttgtgcat 1080 

agcccacttc tgaaaaacaa gggaacagtg tgtaaggaac ttgtgcacat cactgactgg 1140 

taccccaetc tcatttcact ggctgaagga cagattgatg aggacattca actagatggc 1200 

tatgatatct gggagaccat aagtgagggt cttcgctcac eccgagtaga tattttgcat 1260 

aacattgacc ccatatacac caaggcaaaa aatggctcct gggcagcagg ctatgggatc 1320 

tggaacactg caatccagtc agccatcaga gtgcagcact ggaaattgct tacaggaaat 1380 

cctggctaca gcgactgggt cccccctcag tctttcagca acctgggacc gaaccggtgg 1440 

cacaatgaac ggatcacctt gtcaactggc aaaagtgtat ggcttttcaa catcacagec 1500 

gacccatatg agagggtgga cctatctaac aggtatccag gaatcgtgaa gaagctccta 1560 

cggaggctct cacagttcaa caaaactgca gtgccggtca ggtatccecc caaagacccc 1620 

agaagtaacc ctaggctcaa tggaggggtc tggggaccat ggtataaaga ggaaaccaag 1680 

aaaaagaagc caagcaaaaa tcaggctgag aaaaagcaaa agaaaagcaa aaaaaagaag 1740 

aagaaacagc agaaagcagt ctcaggttca acttgccatfc caggtgttac ttgtggataa 1800 
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